[ IVLE ]

[ Overview ]
[ Syllabus ]
[ Grading ]
[ Homework ]
  >HW 1
  HW 2
[ Survey ]
[ Project ]
[ Misc. ]

(Last updated on: Tue Sep 13 16:11:30 GMT-8 2005 )

HW #1: Building a digital library with Greenstone

As we learn about the non-technical aspects of the digital library, you may realize that much of the power of a library comes from the fact that information professionals place in editing and selecting documents for their library. In this assignment, you will be responsible for creating a small digital library, simulating (on a toy scale) the roles that both the acquisition and cataloging departments play in a traditional library.

As digital libraries have become more popular, so have the software to build them. The result is that we now have some very useful frameworks to collect, curate and publish digital collections. Greenstone is one such software built by the folks at Waikato, one of the leading research universities in information retrieval and digital library field. There are others, but we will focus on this popular distribution for DL software, since the software has an open-source license.

Synopsis: You will build a small, reference digital library on a topic of your own personal interest. You will engage in annotating the documents with appropriate metadata and making it distributable in the form of a (public or private) website or as a CD-ROM.


To do this assignment you will need access to a computer:

You will also need a blank CD-ROM (or two, in case the first one fails to write correctly) for your submission.


First, pick a topic of interest to you personally. It can be anything you want, but I suggest a hobby or interest (For example, I would choose rock climbing; another possibility is to use your intended survey topic as your interest area). Search around on the web and the general digital libraries you have access to to determine the coverage for your topic. Perhaps the topic is too broad (too much information on the topic -- e.g., I might narrow down to rock climbing in SE Asia). Define the scope of your digital library collection to cover about approximately 20 resources (e.g., web pages) from at least 10 different sources at a minimum. The final number of documents is up to you but it should cover the whole topic. At least 2 of the resource must not be web pages (but may be web-accessible, e.g., publicly available .PDF files). Copyrighted documents are fine, but see the note later about what to turn in. Note: collection building is a difficult process. Students have decided to change their sample collection for this assignment based on the resources that they have been able to gather. Keep a note of the locations of all the documents that you want incorporate to your collection, as a text file of local file paths and URLs. This step will be quite time consuming, expect to expend at least a couple hours doing this.

Once you have decided which documents to incorporate into your personal interest digital library, download and install Greenstone for your computer. A link to the file for installation for Windows is located at the reference section at the end of this page. I have personally installed both Un*x and Windows versions. Please note the installation requirements of Java, ImageMagick (for image collections only) and Perl and disk space. I will not address your personal installation problems unless you have already turned to the web and the support team for help and are still unable to find an answer. Also install the "Export to CD-ROM" module in the Reference section on this page. You will need this to compile your homework submission into a standalone CD-ROM.

Once installed, you can follow the Greenstone User's Guide on how to use the GLI. The first step is to create a new collection and add documents to it. You will need to invoke the Greenstone Library Interface (GLI), a Java application which looks something like the below.

Greenstone Library Interface screenshot

Under the "File" menu, choose "New..." to create a new collection. When prompted, use the Dublin Core metadata set as the initial metadata set for your collection. Once the operation is completed, your GLI should look like the above (where "test") should be replaced by your collection. Use the "Download" panel to download each of the resources that you noted earlier. Then switch to the "Gather" panel to place the appropriate ones in your collection by dragging them. In GLI, Web sites, MS Word documents, .PDFs are all easy to integrate. When you think you have a reasonable first draft of a collection, switch to the "Create" panel and build your digital library.

Once you have built the initial collection, we have a digital library! You can preview the collection through GLI. This is a library because it was specifically collected by you to be informative on a specific topic. However, we are not done, because the documents are not suitably integrated as a library: we are missing metadata that describe each document in the context of the collection and a description of the collection and its attributes. The remaining parts of the assignment should be done after the "Classification" lecture.

Let's first handle the problem of metadata. As your collection is on a particular topic, there are some attributes that are specific to the types of documents that you have in your collection (e.g., for the climbing example, many of the documents I choose may be logs of climbers going to a particular geographic location). Some of these attributes would be good access points for users of your library. We need to create metadata for these fields to enable searching that is limited to these fields.

We do this by "enriching" the documents with metadata. Quit GLI and Start the Greenstone Editor for Metadata Sets (GEMS). Follow the steps to set up a new metadata set for your particular collection. Your metadata set should contain at least 2 elements. Remember to save the new metadata set. You'll have to decide what are the appropriate metadata for your collection is. Once you have defined the new metadata set, quit GEMS and restart GLI. In the "Design" panel, you can add a new metadata set to your collection (the last option on the left-hand command menu). After it has been added, your personal metadata fields are available to take on values for each document in your collection. Switch back to the "Gather" interface and fill out the appropriate metadata for each document, including the ones for the default Dublin Core metadata.

You will then annotate the documents with the appropriate metadata. To enable this metadata to be used, we have to enable it to be used in searching and browsing. Decide on two additional access points that your collection should be searchable and/or browsable. These tasks can be done in the "Design" panel of GLI. For searching, first create the search indices and then create the new search type. For browsing, build a new browsing classifier that will organize your documents by the new metadata field(s).

That's it! Greenstone has many more capabilities but we don't have the time to explore them. If you are particularly interested in aspects of Greenstone, you can consider modifying Greenstone for your course project or to do a full-fledged DL as an implementation course project.

What to turn in

I will be assessing your homework by running your library through your submission on CD-ROM. Run the "Export to CD-ROM" function from the "File" menu command. This should result in a screen like the below.

Screenshot of GLI with export function finished

Write a README file that describes your submission. In your README file, do not include any information that would identify you. Document what the scope of your collection includes, what it doesn't, and describe the metadata and access points that you used and your rationale for them. The README should not be longer than 1000 words and can be included as the front page to the Greenstone DL. You should try to address the concerns in the grading scheme below in the README.

Take the resulting file and use any CD-ROM writing software to write the file to a CD-ROM. Include the README.txt file that you authored as well. You should label the CD-ROM itself (not the sleeve) with your matric number and generic (u|g)* email using a safe marker. Please use all capital letters when writing your matric number (matric numbers should start with U, HT or HD for most students). Do not write your name or any other information that easily identifies you on your submission, as I want to assess your submission objectively.

Grading scheme

Your grade will be conditional on the following aspects of your assignment:

Although very important in actual digital libraries, modifications to the cosmetic appearance of Greenstone will not be factored in your grade, unless I feel it impacts how potential users would access the information.

Due date and late policy

According to the syllabus, this homework is due by 6 Sep 11:59:59 pm SGT. Late policy for submission apply as per the policy set forth on the "Grading" page.


Min-Yen Kan <> Created on: Thu Jun 16 09:04:02 GMT-8 2005 | Version: 1.0 | Last modified: Tue Sep 13 16:15:41 2005