‹header›

‹date/time›

Click to edit Master text styles

Second level

Third level

Fourth level

Fifth level

‹footer›

‹#›

In classical DL, to search for resources, a user might use a simple keyword search. Usually, digital libraries also provide a catalog, or a classification hierarchy. This provides a very effective way of organizing the resources. It helps the user to quickly do a focus search, and to narrow down the scope. Alternatively, if the user knows some information about the resource he is trying to obtain, searching using metadata might be good.

But, digital libraries contain a variety of resources. What if the user wants to search for photographs, or maps? Would the previously mentioned search techniques be effective in this situation? Also, digital libraries might contain a lot of very long documents or historical text. What if the user wants to search for some historical events?

As another example, if a user wants to search for some dates, or wants to search for books that covers some time period, showing the date distribution for each book will be more useful than simply listing the contents ? If you list the contents, it might be too much for the user.. Especially if the user is looking at a few books to decide which one is most relevant.

We feel that, the best, or rather, most effective way to search for resources, would be to search using their most defining, or characterizing attribute. So, let’s ask ourselves what are the characterizing attributes of these resources? You might say that photographs and maps would have a particular location attached. As for events, we would normally talk about an event that happen at a particular place, at a particular time, involving certain people, or other entities.

This brings us to our topic: “Spatial and Temporal Digital Libaries”. By its name, you can guess that resources in these digital libraries have a spatial (which is geographical), or temporal (which is time) attribute.

-------------------------------------------------

In traditional digital library (DL) models, to search for particular resources, users may do a keyword search, or search using some metadata, such as author-name, title, etc. Alternatively, some DL systems might categorize their resources according to some catalog or classification scheme (e.g.: by subject area) to allow users to focus their search on relevant materials. However, a lot of resources (such as photographs, maps, etc) have spatial or temporal attributes. For example, a photograph is taken at a specific time, at a specific geographic location. It is not very intuitive to perform searches on these kind of resources using… say, keywords.

Also, any DL that had existed for any period of time is bound to store resources that have temporal, or time attribute. Example: a DL storing news articles for the past 20 years. Alternatively, the DL might store a lot of historical articles, or very long historical documents or books. Users of these DL might be interested to know what were the major past events. Where did it happen? When did it happen? Who were the people involved in the event, etc.

As an example, imagine that a user wants to know what are the major events that happened in England in the late 18th century. If the user can only do a simple keyword search, the DL might return a long list of results. The user is then forced to look through the result list. If each of the returned result is a thick book, or very long document, the user might need to spend more time in the search. Questions to ask are: can there be a more suitable search interface? Is there any way in which the DL can help the user to visualize the returned results? Can we have more visually intuitive search interfaces?

Digital libaries contain a lot of information. In spatial and temporal digital libaries, resources would contain either location or time attribute. We have discussed just now that the traditional ways of searching might not be sufficient. So, we are faced with the question of how best to access these resources?

For example, if you want to view resources, or search resources that have a place attribute, it doesn’t really make sense… to have them presented in a list.. Like country by country… it would be more intuitive to show them on a map.. At a glance you can see the concentration of resources…

As another example, if we want to search for shops in a particular district, it will be very hard to use keywords to specify the exact geographic region that our shops will be in… it will be easier and efficient if we can start from a map showing a large region,.. Then progressively navigate deeper and deeper into the map… so tat we get to more and more specific regions. Also.. We might not know the specific names of the streets or shops.. So we can’t really specify them to a keyword interface.

Why do people use keywords for search? It is not a natural way to present user’s needs. User needs -> formulating the concrete question -> keywords … user&IR system interaction … keywords -> mapping documents -> mapping pieces of them, corresponding to keywords. You see, each transition phase is a potential loss of information. Also, the user needs to do a lot of work in order to get the resources. Why couldn’t user specify his information needs in a more direct way? And in our case, why couldn’t the user search directly by choosing some region or some important dates or names?

Now, we will discuss how some of these needs to obtain and visualize resources that have geographic or temporal attributes are currently addressed.

------------------------------------------------------------------------

Q: can we apply the techniques of S&T DL for general DL? (referring to the Perseus example of classification)

The (only?) way to present a long document to user is to present him the distribution of some important elements (names, dates, etc.) in the visual format.

Now, we will cover the various ways in which users are allowed to perform searches in spatial and temporal DL.

Keywords:

This is a very fundamental search method that is present also in traditional digital libraries.

Navigational classification hierarchy:

This search method should also be familiar to most of us. Now, there’s a digital library system called Digital Library for Earth System Education (short form – DLESE) which employs this kind of search interface. DLESE had focused on providing shared geo-science materials to the educational community, to support teaching and learning about the Earth system. Users can search for materials primarily in 2 ways – using keywords and by navigating its classification hierarchy.

This picture shows a portion of DLESE resources organized according to the subject. Alternatively, the user may choose to view resources organized by grade level or resource type. Now, let’s assume the user selects agricultural science. The user will then be able to view materials, or documents pertaining to argriculture.

Timeline:

Now, imagine the library has a collection of many volumes of historical text about the history of England, and you are doing a mini-project that requires you to give a summary of the major events in England in say… late 18th century. If the digital library only provides a simple keyword search, you will have a hard time deciding which volume of text to go through and read. But what if the digital library goes 1 step further by first extracting dates from the text in the various volumes, and then plotted these dates against the volumes in which they appeared in? This should be very helpful in your search. Now, let’s see an example of a digital library system that does this.

The Perseus project is a digital library of resources for the study of humanities.., and naturally, historical text documents form the major resource type. To handle the large volumes of textual data, methods were explored to automatically tag dates and place names in the documents. Dates were also plotted against the volumes in which they appeared. Let me show you an example. (show) In the figure, the vertical axis list the various text volumes. The horizontal axis shows the timeline. Now, let’s look at the dots in the figure. You can see that this particular volume (Russell) talks about history starting from around middle of 18th century. Whereas this volume, Frank E. Smith, mainly focus on history nearing the end of the 18th century. So, to satisfy your need of obtaining summary of the major events in the late 18th century, you might want to have a look at this particular book. Actually, in Perseus, you can also click on the timeline to retrieve relevant sections of the text document.

Informative map-interface:

Just now we mentioned that Perseus plotted dates against the different volumes, so that at a glance, users can see the temporal shift across different volumes. We can imagine that this can similarly be done to place names. Places detected in text can be plotted on a map. Let’s me show you an example. (show) This map shows the various places detected in a particular volume. To be specific, this volume actually talks about London history. Notice the concentration of dots in England. (show) This is another example. Notice this time round, the concentration is on Northern America.

Navigational map-interface:

The last search interface we want to discuss is the navigational map interface. We’ve mentioned earlier that most resources such as aerial photographs, images, maps, historical text, etc. have geographical attributes. It was realized that instead of relying upon the traditional keyword search and hierarchical classification scheme for access to these resources, a novel and more effective approach would be to make use of a map-based interface. (show) By looking at the figure, you can have an idea on what we mean by navigational map interface. This picture is taken from the webpage of the Alexandria Digital Library (ADL). ADL’s collection focuses on information supporting Earth and Social Sciences, which includes resources such as aerial photographs, world maps, etc. In ADL, the main search interface will be this map browser. The primary function of this map browser is to let the user define spatial regions as one of the feature to search for resources. For example, in this figure, we’ve selected USA, which means we would only like to obtain resources which have some connection to America. Notice the square grids in the figure. This map browser allows us to go deeper inside the map, so that we can define a more specific geographic region.

We had talked about 3 digital libaries – the Alexandria Digital library, the Perseus, and DLESE. These digital libraries stores sources that have spatial or temporal attributes, and we have discussed the search interfaces that they provide.

Now, let me give a summary.

For the second half of our presentation, we will start off by discussing why it is important to extract name entities such as date, time, location, names, etc. A method to perform name entity extraction will also be discussed.

Each text is analyzed before the extraction stage. Each word is assigned with 4 features: lemma, lexical category, case and semantic category. Rules are extracted in the form of window with w words to the left, and w words to the right side of a boundary of named entity (table 1). Each rule can be presented as a table.

There are 2 steps in the training process. Initially a set of tagging rules is learned. Next stage consists of inducing metarules, which correct mistakes and imprecision in tagging process.

First, users create some set of trained texts for a domain. They mark positive examples of relevant named entities. The rest of the corpus is considered a pool of negative examples.

The algorithm goes through training stage using this corpus.

Tagging rules are induced only for left or right boundary of each Named Entity. For every positive example algorithm does several steps:

1. build initial rule

2. generalize rule

3. keep k best generalizations of the initial rule