CS 5244 - Digital Libraries

NUS SoC, 2004/2005, Semester I
LT 34, Tuesdays 18:30-20:30


Menu

[ IVLE ]

[ Overview ]
[ >Syllabus ]
[ Grading ]
[ Project ]
[ Survey ]
[ Misc. ]

(Last updated on: Sun Oct 31 17:11:55 GMT-8 2004 )

Any questions about this information should be directed to the general forum on IVLE. Note that no university holidays affect the scheduling for this course. Supplemental readings are marked with a "*".

Unit Date Description Deadlines
What is a DL Week 0:
(3 Aug)
Class cancelled due to school policy.
Building a DL Week 1:
(10 Aug)
Orientation / Fundamentals of information retrieval
Course information, policies and scope, breadth of research encompassed by DLs.
Document indexing, TF*IDF, Boolean retrieval model, Vector space model, Algebraic models of retrieval.

Slides: Lecture Notes [ .htm ] [ .pdf ]

Readings:

  • Vannevar Bush (1945) As we may think, The Atlantic Monthly (selected parts during class) [ Section 6 ] [ Section 7 ]
  • Baeza-Yates and Riberto-Neto (1999), Chapter 2.1 - 2.5.5
  • *Lesk (1997), Chapter 1, Evolution of Libraries
  • *Lesk (1997), Chapter 2, Text Access Methods
  • *Lesk (1997), Chapter 5, Sections 5.5-5.8, Knowledge Representation Methods
  • *Witten, Moffat and Bell (1999), Chapter 3.1 - 3.3 (up to Nonparameterized models) & 3.7
  • *Witten, Moffat and Bell (1999), Chapter 4.4 - 4.6 (Query processing)
  • *Witten, Moffat and Bell (1999), Chapter 5.1 - 5.2 (Index Construction)
Week 2:
(17 Aug)
Storing information
Multimedia encodings: text (SGML, Unicode, TEI, XML), page images (CCITT FAX IV) images (JPEG, PNG, PS), video (MPEG), audio (MP3,WAV), synchronized media (SMIL).

Slides: [ .htm ] [ .pdf ]
Self-study module: Huffman Encoding [ .htm ] [ .pdf ]

Readings:

  • David Bainbridge, Craig G. Nevill-Manning, Ian H. Witten, Lloyd A. Smith, Rodger J. McNab, (1999) Towards a Digital Library of Popular Music. Available from the ACM Digital Library or LINC or directly from Nevill-Manning's website).
  • Lesk (1997), Chapter 3, Images of Pages.
  • Lesk (1997), Chapter 4, Multimedia Storage and Access.
  • *Witten, Moffat and Bell (1999), Chapter 6.1 - 6.2, 6.5 (GIF/PNG section only), 6.6 (JPEG section)
  • *Witten, Moffat and Bell (1999), Chapter 7, Section 1
  • *Witten, Moffat and Bell (1999), Chapter 8.
Pick and finalize survey paper area (in class).
Week 3:
(24 Aug)
Classification
Traditional classification schemes (DDC, LCSH, MeSH), metadata types, Dublin core, Warwick framework.

Slides: [ .htm ] [ .pdf ]
Self-study module: WordNet [ .htm ] [ .pdf ]

Readings:

  • Lesk (1997), Chapter 5, Sections 5.1-5.3, Knowledge Representation Methods.
  • *Marshall, Catherine (1998), Making Metadata: a study of metadata creation for a mixed physical-digital collection. In Proc of Digital Libraries 1998
  • *Ipeirotis et al. (2002) Extending SDARTS: Extracting Metadata from Web Databases and Interfacing with the Open Archives Initiative, in Proc. of the Second ACM+IEEE Joint Conference on Digital Libraries (JCDL), 2002.
  • *Vellucci, Sherry L. "Metadata." Annual Review of Information Science and Technology 33 (1998): 187-222. On reserve from the RBR.
Week 4:
(31 Aug)
DL policy, interoperability and access rights
Identifiers: Open Archives Initiative, metadata harvesting, OpenURL. DL economics and social policy and issues.

Slides: [ .htm ] [ .pdf ]

Readings:

  • Lesk (1997), Chapters 9-10, Economics and Intellectual Property Rights.
  • *Lagoze and van de Sompel (2001) The Open Archives Initiative: Building a low-barrier interoperability framework (.pdf link)
  • *Arms (2000) Chapter 12, Object models, identifiers and structural metadata.
Week 5:
(7 Sep)
One-hour Midterm
Short session to catch up with material presented thus far.

Slides (are an abbreviated form of last week's): [ .htm ] [ .pdf ]
Midterm test [ .pdf ]

Using the DL Week 6:
(14 Sep)
Bibliometrics and its applications
Laws of bibliometrics, Citations and references, Pagerank, HITS.

Slides: [ .htm ] [ .pdf ]

Readings:

  • For more details on Google in general: Brin and Page (1998) The Anatomy of a Search Engine.
  • For more details on the HITS algorithm: Jon Kleinberg (1998) Authoritative sources in a hyperlinked environment In Proc. Ninth Ann. ACM-SIAM Symp. Discrete Algorithms, pages 668-677, ACM Press, New York.
  • Steve Lawrence, C. Lee Giles, Kurt Bollacker (1999) Digital Libraries and Autonomous Citation Indexing
  • ISI's Impact Factor: Essays by Eugene Garfield on citation analysis (Re-printed from Current Contents)
  • *Simone Teufel and Marc Moens (2000) What's yours and what's mine: Determining Intellectual Attribution in Scientific Text Proceedings of EMNLP Hong Kong, Oct 2000.
  • *Steve Hitchcock, Arouna Woukeu, Tim Brody, Les Carr, Wendy Hall and Stevan Harnad (2003) Evaluating Citebase: Key Usability Results
Survey papers due in IVLE by the 17th. Form project teams and schedule a meeting with Min to go over your project proposal.
Week 7:
(18 Sep)
1 hr make-up lecture
Semantic Web
Motivation for the Semantic Web, SW Layer cake, overview on RDF, OWL.

Slides: [ .htm ] [ .pdf ]
Project proposal slides: [ .pdf ] [ .htm ]

Readings:

  • Tim Berners-Lee, James Hendler and Ora Lassila (2001) The Semantic Web, Scientific American, May 2001.
  • Frank Manola and Eric Miller (2004) RDF Primer, W3C Recommendation. Read Sections 1, 2.1-2.2, 2.5, 3.1.
  • *James Hendler (2003) Science and the Semantic Web, Science.
  • *Michael K. Smith, Chris Welty, Deborah McGuinness (2003) OWL Web Ontology Language Guide, W3C Recommendation. Read up to Sections 1-3, and relevant portions ofSection 6. You should read the text under the main headers and the first subheader (e.g., Section 1, Section 1.1). You can safely skip the material in the sub-subheaders (e.g., 1.1.1)
Mid-semester Break (Sun 19 Sep - Thu 23 Sep 2004)
Week 8:
(29 Sep)
Information seeking
Reference interviews, Information seeking process, Anomalous state of knowledge.

Slides: [ .htm ] [ .pdf ]

Readings:

  • Nicolas Belkin, R. Oddy and H. Brooks (1980) ASK - Anomalous State of Knowledge - Part I
  • Daniel Rose and Danny Levinson (2004) Understanding User Goals in Web Search. WWW 2004.
  • Marcia Bates (1989) The Design on Browsing and Berrypicking Techniques for the Online Search Interface Online Review 13 (October 1989): 407-424.
  • *Nardi, Bonnie A. (1999) Librarians: A Keystone Species, In Information Ecologies, MIT Press. On Reserve in the RBR.
Project proposals returned.
Week 9:
(5 Oct)
User interfaces for querying and displaying documents
Survey of query (text, Venn, faceted metadata) and document displays (ranked list, Infocrystal, Table lens, tilebars)

Slides: [ .htm ] [ .pdf ]

Readings:

  • Lesk (1997), Chapter 7, Usability and Retrieval Evaluation, Sections 7.1-7.5.
  • Hearst, Marti A. (1999) User Interfaces, In Baeza-Yates and Ribeiro-Neto (eds.), Modern Information Retrieval, 1999.
Survey paper grades.
Week 10:
(12 Oct)
Usage patterns in the DL
Usage mining. How DLs and web sites are used, and their relation to information seeking and HCI.

Slides: [ .htm ] [ .pdf ]

Readings:

  • Bishop (1998) Digital Libraries and Knowledge Disaggregation: The Use of Journal Article Compoenents, DL 1998.
  • Tauscher and Greenberg (1997) Revisitation Patterns in World Wide Web Navigation. CHI 1997.
  • Milic-Frayling et al. (2004) SmartBack: Supporting Users in Back Navigation. WWW '04.
  • *Choo, Detlor, and Turnbull (2000) Information Seeking on the Web: An integrated Model of Browsing and Searching. First Monday.
  • *Kang and Kim (2003) Query Type Classification for Web Document Retrieval. SIGIR '03.
Midterm grades returned.
Midterm Answers [ .pdf ]
Week 11:
(19 Oct)
Evaluation
Traditional library evaluation, review of standard IR evaluation metrics.

Slides: [ .htm ] [ .pdf ]

Readings:

  • Lesk (1997), Chapter 7, Usability and Retrieval Evaluation, Section 7.6
  • Witten, Moffat and Bell (99) Managing Gigabytes, Section 4.5.
  • *Baker and Lancaster (91) The Measurement and Evaluation of Library Services, Information Resources Press
Week 12:
(26 Oct)
Extended services for the DL
Collaborative filtering, Recommender systems, Reputation schilling, Authorship attribution, Plagiarism detection.

Slides: [ .htm ] [ .pdf ]
Addendum on Naive Bayes: [ .htm ] [ .pdf ]

Readings:

  • Breese, Heckerman, Kadie (98) Empirical Analysis of Predictive Algorithms for Collaborative Filtering, Proc. of Uncertainty in AI [ .ps ]
  • Khmelev and Teahan (04) A repetition based measure for verification of text collections and for text categorization, Proc. of WWW 2004.
  • *Lam and Riedl (04) Shilling recommender systems for fun and profit. In Proc. of WWW 2004. [ ACM Portal link ]
  • *Karlgren & Cutting (94) Recognizing Text Genres with Simple Metrics Using Discriminant Analysis, Proc. of COLING-94.
  • *Shivakumar & Garcia-Molina (95) SCAM: A copy detection mechanism for digital documents, Proc. of DL 95
Week 13:
(2 Nov)
Instant Messaging, Email, Web logs and Wikis: New media for information
Characteristics and their use, tracking knowledge development in new media, and
Course Revision.

Slides: [ .htm ] [ .pdf ]

Readings:

  • Bellotti et al. (2003) Integrating tools and tasks: Taking email to task: the design and evaluation of a task management centered email tool, Proc. CHI 2003
  • Gruhl et al. (2004) Information diffusion through blogspace, Proc. WWW 2004.
  • *Kleinberg (2003) Bursty and Hierarchical Structure in Streams, Data Mining and Knowledge Discovery, 7(4)
  • *Jackson et al. (2003) Understanding email interaction increases organizational productivity, CACM
  • *Christopher Campbell et al. (2003) Expertise identification using email communications, Proc. CIKM 2003.
Week 14:
(9 Nov)
Project Presentations
No class. Poster presentations in lieu of class.
Final project poster presentation.
Reading Week (Fri 12 Nov - Thu 18 Nov 2004)

Exam Date: Tuesday, 30 Nov 2004, 7:30 pm


Min-Yen Kan <kanmy@comp.nus.edu.sg> Created on: Mon Dec 1 19:36:22 2003 | RCS: $Id: syllabus.html,v 1.2 2004/08/11 06:00:38 kanmy Exp kanmy $ | Version: 1.0 | Last modified: Mon Nov 1 16:59:25 2004