CS 5244 - Digital Libraries

NUS SoC, 2004/2005, Semester I
LT 34, Tuesdays 18:30-20:30


Menu

[ IVLE ]

[ Overview ]
[ >Syllabus ]
[ Grading ]
[ Homework ]
[ Survey ]
[ Project ]
[ Misc. ]

(Last updated on: Tue Oct 25 17:49:59 GMT-8 2005 )

The first four weeks will focus on building a digital library and will be reinforced by a practical homework assignment. The remaining lectures will focus on using digital libraries.

Any questions about this information should be directed to the general forum on IVLE. Supplemental readings are marked with a "*".

Date Description Deadlines
Week 1:
(16 Aug)
Orientation / Fundamentals of information retrieval
Course information, policies and scope, breadth of research encompassed by DLs.
Document indexing, TF*IDF, Boolean retrieval model, Vector space model, Algebraic models of retrieval.

Slides: [ .htm ] [ .pdf ]

Readings:

  • Baeza-Yates and Riberto-Neto (1999), Chapter 2.1 - 2.5.5
  • Bush (1945) As we may think, The Atlantic Monthly (selected parts during class) [ Sections 6-7]
  • Taylor (2004), Chapter 1, Organization of Recorded Information
  • Witten, Moffat and Bell (1999), Chapter 3.1 - 3.3 (up to Nonparameterized models) & 3.7
  • *Lesk (1997), Chapter 1, Evolution of Libraries
  • *Lesk (1997), Chapter 2, Text Access Methods
  • *Lesk (1997), Chapter 5, Sections 5.5-5.8, Knowledge Representation Methods
  • *Witten, Moffat and Bell (1999), Chapter 4.4 - 4.6 (Query processing)
  • *Witten, Moffat and Bell (1999), Chapter 5.1 - 5.2 (Index Construction)
Week 2:
(23 Aug)
Storing information
Multimedia encodings: text (SGML, Unicode, TEI, XML), page images (CCITT FAX IV) images (JPEG, PNG, PS), video (MPEG), audio (MP3,WAV), synchronized media (SMIL).

Slides: [ .htm ] [ .pdf ]
Self-study module: Huffman Encoding [ .htm ] [ .pdf ]

Readings:

  • Lesk (1997), Chapter 3, Images of Pages.
  • Lesk (1997), Chapter 4, Multimedia Storage and Access.
  • *Bainbridge, Nevill-Manning, Witten, Smith and McNab, (1999) Towards a Digital Library of Popular Music. Available from the ACM Digital Library or LINC or directly from Nevill-Manning's website).
  • *Witten, Moffat and Bell (1999), Chapter 6.1 - 6.2, 6.5 (GIF/PNG section only), 6.6 (JPEG section)
  • *Witten, Moffat and Bell (1999), Chapter 7, Section 1
  • *Witten, Moffat and Bell (1999), Chapter 8.
  • Pick and finalize survey paper area (in class).
  • Homework #1 out (Building a digital library with Greenstone)
  • Greenstone tutorial
Week 3:
(Make-up lecture: 29 Aug 2005 SR 4 (SoC 1, Lvl 6 #12))
Classification
Traditional classification schemes (DDC, LCSH, MeSH), metadata types, Dublin core, Warwick framework.

Slides: [ .htm ] [ .pdf ]
Self-study module: WordNet [ .htm ] [ .pdf ]

Readings:

  • Lesk (1997), Chapter 5, Sections 5.1-5.3, Knowledge Representation Methods.
  • Taylor (2004), Chapter 4, Encoding Standards.
  • Taylor (2004), Chapter 6, Metadata.
  • *Marshall, Catherine (1998), Making Metadata: a study of metadata creation for a mixed physical-digital collection. In Proc of Digital Libraries 1998
  • *Ipeirotis et al. (2002) Extending SDARTS: Extracting Metadata from Web Databases and Interfacing with the Open Archives Initiative, in Proc. of the Second ACM+IEEE Joint Conference on Digital Libraries (JCDL), 2002.
  • *Vellucci, Sherry L. "Metadata." Annual Review of Information Science and Technology 33 (1998): 187-222. On reserve from the RBR.
Week 4:
(30 Aug)
DL policy, interoperability and access rights
Identifiers: Open Archives Initiative, OpenURL. DL economics and social policy and issues.

Slides: [ .htm ] [ .pdf ]

Readings:

  • Lesk (1997), Chapters 9-10, Economics and Intellectual Property Rights.
  • Arms (2000) Chapter 12, Object models, identifiers and structural metadata.
  • Arms (2000) Chapter 6, Economic and legal issues.
  • *Lagoze and van de Sompel (2001) The Open Archives Initiative: Building a low-barrier interoperability framework (.pdf link)
  • *IPOS - Intellectual Property Office of Singapore - Especially section "Copyright and the Intenet".
  • *Lagoze and van de Sompel (2001) The Open Archives Initiative: Building a low-barrier interoperability framework (.pdf link)
Week 5:
(6 Sep)
Bibliometrics and its applications
Laws of bibliometrics, Citations and references, Pagerank, HITS.

Slides: [ .htm ] [ .pdf ]

Readings:

  • For more details on Google in general: Brin and Page (1998) The Anatomy of a Search Engine.
  • For more details on the HITS algorithm: Jon Kleinberg (1998) Authoritative sources in a hyperlinked environment In Proc. Ninth Ann. ACM-SIAM Symp. Discrete Algorithms, pages 668-677, ACM Press, New York.
  • Steve Lawrence, C. Lee Giles, Kurt Bollacker (1999) Digital Libraries and Autonomous Citation Indexing
  • ISI's Impact Factor: Essays by Eugene Garfield on citation analysis (Re-printed from Current Contents)
  • *Simone Teufel and Marc Moens (2000) What's yours and what's mine: Determining Intellectual Attribution in Scientific Text Proceedings of EMNLP Hong Kong, Oct 2000.
  • *Steve Hitchcock, Arouna Woukeu, Tim Brody, Les Carr, Wendy Hall and Stevan Harnad (2003) Evaluating Citebase: Key Usability Results
  • Homework #1 due directly to me by 6 Sep, 11:59 pm SGT. You may submit your CDROM after class or on 7 Sep during office hours.
Week 6:
(13 Sep)
Information seeking
Reference interviews, Information seeking process, Anomalous state of knowledge.

Slides: [ .htm ] [ .pdf ]

Readings:

  • Nicolas Belkin, R. Oddy and H. Brooks (1980) ASK - Anomalous State of Knowledge - Part I
  • Daniel Rose and Danny Levinson (2004) Understanding User Goals in Web Search. WWW 2004.
  • Marcia Bates (1989) The Design on Browsing and Berrypicking Techniques for the Online Search Interface Online Review 13 (October 1989): 407-424.
  • *Nardi, Bonnie A. (1999) Librarians: A Keystone Species, In Information Ecologies, MIT Press. On Reserve in the RBR.
  • Survey papers due in IVLE workbin by 13 Sep 11:59 pm SGT.
  • Form project teams and schedule a meeting with Min to go over your project proposal.
Mid-semester Break (Fri 16 Sep - Thu 22 Sep 2005)
Week 7:
(27 Sep)
User interfaces for querying and displaying documents
Survey of query (text, Venn, faceted metadata) and document displays (ranked list, Infocrystal, Table lens, tilebars)

Slides: [ .htm ] [ .pdf ]

Readings:

  • Lesk (1997), Chapter 7, Usability and Retrieval Evaluation, Sections 7.1-7.5.
  • Hearst, Marti A. (1999) User Interfaces, In Baeza-Yates and Ribeiro-Neto (eds.), Modern Information Retrieval, 1999.
  • Project proposals returned.
  • Homework #1 grades returned.
Week 8:
(10 Oct 8-10 *AM*, LT 33)
Usage patterns in the DL
Usage mining. How DLs and web sites are used, and their relation to information seeking and HCI.

Slides: [ .htm ] [ .pdf ]

Readings:

  • Bishop (1998) Digital Libraries and Knowledge Disaggregation: The Use of Journal Article Compoenents, DL 1998.
  • Tauscher and Greenberg (1997) Revisitation Patterns in World Wide Web Navigation. CHI 1997.
  • Milic-Frayling et al. (2004) SmartBack: Supporting Users in Back Navigation. WWW '04.
  • *Choo, Detlor, and Turnbull (2000) Information Seeking on the Web: An integrated Model of Browsing and Searching. First Monday.
  • *Kang and Kim (2003) Query Type Classification for Web Document Retrieval. SIGIR '03.
Week 9:
(11 Oct)
Computational Analysis of Genre, Authorship and Duplication
Authorship attribution, Plagiarism detection.
Self-Study on Naive Bayes: [ .htm ] [ .pdf ]

Slides: [ .htm ] [ .pdf ]

Readings:

  • Khmelev and Teahan (04) A repetition based measure for verification of text collections and for text categorization, Proc. of WWW 2004.
  • Karlgren & Cutting (94) Recognizing Text Genres with Simple Metrics Using Discriminant Analysis, Proc. of COLING-94.
  • *Mosteller & Wallace (63) Inference in an authorship problem, J American Statistical Association 58(3)
  • *de Vel, Anderson, Corney & Mohay (01) Mining Email Content for Author Identification Forensics, SIGMOD Record
  • *Foster (00) Author Unknown. Owl Books PE1421 Fos
  • *Biber (89) A typology of English texts, Linguistics, 27(3)
  • *Lee and Myaeng (02) Text genre classification with genre-revealing and subject-revealing features, SIGIR 02
  • *Shivakumar & Garcia-Molina (95) SCAM: A copy detection mechanism for digital documents, Proc. of DL 95
  • *Belkouche et al. (04) Plagiarism Detection in Software Designs, ACM Southeast Conference
  • *Bilenko and Mooney (03) Adaptive duplicate detection using learnable string similarity measures, Proc. of KDD 03.
  • *Ramaswamy et al. (04) Automatic detection of fragments in dynamically generated web pages, Proc. WWW 04.
  • Survey papers returned.
  • Homework #2 out - (Authorship attribution of Amazon.com reviews)
  • SVMlight tutorial (immediately following class)
Week 10:
(18 Oct)
Collaborative Filtering

Readings:

  • Breese, Heckerman, Kadie (98) Empirical Analysis of Predictive Algorithms for Collaborative Filtering, Proc. of Uncertainty in AI [ .ps ]
  • *Lam and Riedl (04) Shilling recommender systems for fun and profit. In Proc. of WWW 2004. [ ACM Portal link ]
Week 11:
(25 Oct)
Instant Messaging, Email, Web logs and Wikis: New media for information
Characteristics and their use, tracking knowledge development in new media, and
Course Revision.

Slides: [ .htm ] [ .pdf ]

Readings:

  • Bellotti et al. (2003) Integrating tools and tasks: Taking email to task: the design and evaluation of a task management centered email tool, Proc. CHI 2003
  • Gruhl et al. (2004) Information diffusion through blogspace, Proc. WWW 2004.
  • *Kleinberg (2003) Bursty and Hierarchical Structure in Streams, Data Mining and Knowledge Discovery, 7(4)
  • *Jackson et al. (2003) Understanding email interaction increases organizational productivity, CACM
  • *Campbell et al. (2003) Expertise identification using email communications, Proc. CIKM 2003.
  • Homework #2 due in IVLE workbin by 25 Oct 11:59 pm SGT
Week 12:
(1 Nov)
Deepavali
No class. Poster presentations in lieu of class, later on the 19th.
Reading Week (Fri 12 Nov - Thu 18 Nov 2004)
19 Nov, Sat Project Presentations
Poster presentations in Min's office (S15 05-05).
  • Homework #2 returned
Final Exam (Tue 22 Nov 7:30-9:30 pm)

Min-Yen Kan <kanmy@comp.nus.edu.sg> Created on: Mon Dec 1 19:36:22 2003 | RCS: $Id: syllabus.html,v 1.2 2004/08/11 06:00:38 kanmy Exp kanmy $ | Version: 1.0 | Last modified: Thu Jul 27 15:34:49 2006