CS 5244 - Digital Libraries

NUS SoC, 2006/2007, Semester I
LT 34, Tuesdays 18:30-20:30


Menu

[ IVLE ]

[ Overview ]
[ >Syllabus ]
[ Grading ]
[ Homework ]
[ Survey ]
[ Project ]
[ Misc. ]

We will have a short, 30-minute or so orientation meeting on the 8th. If you cannot make it, just review the slides in IVLE. The first four weeks will focus on building a digital library and will be reinforced by a practical homework assignment. The remaining lectures will focus on using digital libraries.

I will be away from the 3-9 October, so I'm rescheduling the class one day earlier on Monday (venue to be announced). Hari Raya Puasa falls on our lecture day so we have one fewer lecture as a result.

The lecture notes here are not as complete as those in IVLE. You should use the ones in IVLE if possible. Any questions about this information should be directed to the general forum on IVLE. Supplemental readings are marked with a "*".

The hyperlinks here all work as of Fri Jul 14 14:35:52 GMT-8 2006, when I updated this page. Use a search engine with the appropriate text if the links below stop working.

Date Description Deadlines
Week 0:
(8 Aug)
Orientation
Course information, policies and scope

Slides: [ .htm ] [ .pdf ] (same link as next week)

  • Please fill out the pre-flight survey in IVLE
Week 1:
(15 Aug)
Orientation / Fundamentals of information retrieval
Course information, policies and scope, breadth of research encompassed by DLs.
Document indexing, TF*IDF, Boolean retrieval model, Vector space model, Algebraic models of retrieval.

Slides: [ .htm ] [ .pdf ]

Readings:

  • Baeza-Yates and Riberto-Neto (1999), Chapter 2.1 - 2.5.5
  • Bush (1945) As we may think, The Atlantic Monthly (selected parts during class)
  • Taylor (2004), Chapter 1, Organization of Recorded Information
  • Witten, Moffat and Bell (1999), Chapter 3.1 - 3.3 (up to Nonparameterized models) & 3.7
  • *Lesk (1997), Chapter 1, Evolution of Libraries
  • *Lesk (1997), Chapter 2, Text Access Methods
  • *Lesk (1997), Chapter 5, Sections 5.5-5.8, Knowledge Representation Methods
  • *Witten, Moffat and Bell (1999), Chapter 4.4 - 4.6 (Query processing)
  • *Witten, Moffat and Bell (1999), Chapter 5.1 - 5.2 (Index Construction)
Week 2:
(22 Aug)
Storing information
Multimedia encodings: text (SGML, Unicode, TEI, XML), page images (CCITT FAX IV) images (JPEG, PNG, PS), video (MPEG), audio (MP3,WAV), synchronized media (SMIL).

Slides: [ .htm ] [ .pdf ]
Self-study module: Huffman Encoding [ .htm ] [ .pdf ]

Readings:

  • Lesk (1997), Chapter 3, Images of Pages.
  • Lesk (1997), Chapter 4, Multimedia Storage and Access.
  • *Bainbridge, Nevill-Manning, Witten, Smith and McNab, (1999) Towards a Digital Library of Popular Music. Available from the ACM Digital Library or LINC).
  • *Witten, Moffat and Bell (1999), Chapter 6.1 - 6.2, 6.5 (GIF/PNG section only), 6.6 (JPEG section)
  • *Witten, Moffat and Bell (1999), Chapter 7, Section 1
  • *Witten, Moffat and Bell (1999), Chapter 8.
  • Pick and finalize survey paper area (in class).
  • Homework #1 out (Building a digital library with Greenstone)
  • Greenstone tutorial
Week 3:
(29 Aug)
Classification
Traditional classification schemes (DDC, LCSH, MeSH), metadata types, Dublin core, Warwick framework.

Slides: [ .htm ] [ .pdf ]
Self-study module: WordNet [ .htm ] [ .pdf ]

Readings:

  • Lesk (1997), Chapter 5, Sections 5.1-5.3, Knowledge Representation Methods.
  • Taylor (2004), Chapter 4, Encoding Standards.
  • Taylor (2004), Chapter 6, Metadata.
  • *Marshall, Catherine (1998), Making Metadata: a study of metadata creation for a mixed physical-digital collection. In Proc of Digital Libraries 1998
  • *Ipeirotis et al. (2002) Extending SDARTS: Extracting Metadata from Web Databases and Interfacing with the Open Archives Initiative, in Proc. of the Second ACM+IEEE Joint Conference on Digital Libraries (JCDL), 2002.
  • *Vellucci, Sherry L. "Metadata." Annual Review of Information Science and Technology 33 (1998): 187-222. On reserve from the RBR.
Week 4:
(5 Sep)
DL policy, interoperability and access rights
Identifiers: Open Archives Initiative, OpenURL. DL economics and social policy and issues.

Slides: [ .htm ] [ .pdf ]

Readings:

  • Lesk (1997), Chapters 9-10, Economics and Intellectual Property Rights.
  • Arms (2000) Chapter 12, Object models, identifiers and structural metadata.
  • Arms (2000) Chapter 6, Economic and legal issues.
  • *Lagoze and van de Sompel (2001) The Open Archives Initiative: Building a low-barrier interoperability framework (.pdf link)
  • *IPOS - Intellectual Property Office of Singapore - Especially section "Copyright and the Intenet".
Week 5:
(12 Sep)
Bibliometrics and its applications
Laws of bibliometrics, Citations and references, Pagerank, HITS.

Slides: [ .htm ] [ .pdf ]

Readings:

  • For more details on Google in general: Brin and Page (1998) The Anatomy of a Search Engine.
  • For more details on the HITS algorithm: Jon Kleinberg (1998) Authoritative sources in a hyperlinked environment In Proc. Ninth Ann. ACM-SIAM Symp. Discrete Algorithms, pages 668-677, ACM Press, New York.
  • Steve Lawrence, C. Lee Giles, Kurt Bollacker (1999) Digital Libraries and Autonomous Citation Indexing
  • ISI's Impact Factor: Essays by Eugene Garfield on citation analysis (Re-printed from Current Contents)
  • *Simone Teufel and Marc Moens (2000) What's yours and what's mine: Determining Intellectual Attribution in Scientific Text Proceedings of EMNLP Hong Kong, Oct 2000.
  • *Steve Hitchcock, Arouna Woukeu, Tim Brody, Les Carr, Wendy Hall and Stevan Harnad (2003) Evaluating Citebase: Key Usability Results
  • Homework #1 due at the middle of class 12 Sep, 7:30 pm SGT. You may submit your CDROM during class break.
Week 6:
(19 Sep)
User interfaces for querying and displaying documents
Survey of query (text, Venn, faceted metadata) and document displays (ranked list, Infocrystal, Table lens, tilebars)

Slides: [ .htm ] [ .pdf ]

Readings:

  • Lesk (1997), Chapter 7, Usability and Retrieval Evaluation, Sections 7.1-7.5.
  • Hearst, Marti A. (1999) User Interfaces, In Baeza-Yates and Ribeiro-Neto (eds.), Modern Information Retrieval, 1999.
  • Survey papers due in IVLE workbin by 19 Sep 11:59 pm SGT.
  • Form project teams and schedule a meeting with Min to go over your project proposal.
Mid-semester Break (Fri 23 Sep - Fri 30 Sep 2006)
Week 7:
(Special date: Mon, 2 Oct in TR 3(S16 03-09)
Information seeking
Reference interviews, Information seeking process, Anomalous state of knowledge.

Slides: [ .htm ] [ .pdf ]

Readings:

  • Nicolas Belkin, R. Oddy and H. Brooks (1980) ASK - Anomalous State of Knowledge - Part I (Look for Belkin ASK p1.pdf)
  • Daniel Rose and Danny Levinson (2004) Understanding User Goals in Web Search. WWW 2004.
  • Marcia Bates (1989) The Design on Browsing and Berrypicking Techniques for the Online Search Interface Online Review 13 (October 1989): 407-424.
  • *Nardi, Bonnie A. (1999) Librarians: A Keystone Species, In Information Ecologies, MIT Press. On Reserve in the RBR.
  • Project proposals returned.
  • Homework #1 grades returned.
Week 8:
(10 Oct)
Usage patterns in the DL and the Web
Usage mining. How DLs and web sites are used, and their relation to information seeking and HCI.

Slides: [ .htm ] [ .pdf ]

Readings:

  • Bishop (1998) Digital Libraries and Knowledge Disaggregation: The Use of Journal Article Components, DL 1998.
  • Tauscher and Greenberg (1997) Revisitation Patterns in World Wide Web Navigation. CHI 1997.
  • Milic-Frayling et al. (2004) SmartBack: Supporting Users in Back Navigation. WWW '04.
  • *Choo, Detlor, and Turnbull (2000) Information Seeking on the Web: An integrated Model of Browsing and Searching. First Monday.
  • *Kang and Kim (2003) Query Type Classification for Web Document Retrieval. SIGIR '03.
  • *Eelco Herder (2005) Characterizations of User Web Revisit Behavior. Workshop on Adaptivity and User Modeling in Interactive Systems (ABIS 05).
Week 9:
(17 Oct)
Computational Analysis of Genre, Authorship and Duplication
Authorship attribution, Plagiarism detection.
Self-Study on Naive Bayes: [ .htm ] [ .pdf ]

Slides: [ .htm ] [ .pdf ]

Readings:

  • Khmelev and Teahan (04) A repetition based measure for verification of text collections and for text categorization, Proc. of WWW 2004. [ CiteSeer Link ]
  • Karlgren & Cutting (94) Recognizing Text Genres with Simple Metrics Using Discriminant Analysis, Proc. of COLING-94. [ CiteSeer Link ]
  • *Mosteller & Wallace (63) Inference in an authorship problem, J American Statistical Association 58(3)
  • *de Vel, Anderson, Corney & Mohay (01) Mining Email Content for Author Identification Forensics, SIGMOD Record
  • *Foster (00) Author Unknown. Owl Books PE1421 Fos
  • *Biber (89) A typology of English texts, Linguistics, 27(3)
  • *Lee and Myaeng (02) Text genre classification with genre-revealing and subject-revealing features, SIGIR 02
  • *Shivakumar & Garcia-Molina (95) SCAM: A copy detection mechanism for digital documents, Proc. of DL 95
  • *Belkouche et al. (04) Plagiarism Detection in Software Designs, ACM Southeast Conference
  • *Bilenko and Mooney (03) Adaptive duplicate detection using learnable string similarity measures, Proc. of KDD 03.
  • *Ramaswamy et al. (04) Automatic detection of fragments in dynamically generated web pages, Proc. WWW 04.
  • Survey papers returned.
  • Homework #2 out - (Authorship attribution of Amazon.com reviews)
  • SVMlight tutorial (immediately following class)
Week 10:
(31 Oct)
Social Navigation: Collaborative Filtering

Readings:

  • Breese, Heckerman, Kadie (98) Empirical Analysis of Predictive Algorithms for Collaborative Filtering, Proc. of Uncertainty in AI [ CiteSeer link ]
  • *Lam and Riedl (04) Shilling recommender systems for fun and profit. In Proc. of WWW 2004. [ ACM Portal link ]
  • *Wexelblat and Maes (99) Footprints: History-rich tools for information foraging. In Proc. of CHI 1999. [ CiteSeer link ]
  • *Resnick et al. (94) GroupLens: An Open Architecture for Collaborative Filtering of Netnews, Internal Research Report, MIT Center for Coordination Science. [ CiteSeer link ]
  • *Sarwar et al. (01) Item-based collaborative filtering recommendation algorithms. In Proc. of WWW '01 [ CiteSeer link ]
  • *Shardanand and Maes (95) Social Information Filtering: Algorithms for Automating Word of Mouth. In Proc. of CHI '95 [ ]
  • *Smyth et al. (04) Exploiting Query Repetition and Regularity in an Adaptive Community-Based Web Search Engine. User Modeling and User-Adapted Interaction. 14(5)
  • *Lam and Riedl (04) Shilling recommender systems for fun and profit. In Proc. of WWW 2004. [ ACM Portal link ]
Week 11:
(7 Nov)
Library 2.0: New Media
Characteristics and their use, tracking knowledge development and dissemination in new media: Email, Instant Messaging, Weblogs, Wikis and Folksonomies.

Slides: [ .htm ] [ .pdf ]

Readings:

  • Maness (2006) Library 2.0 Theory: Web 2.0 and Its Implications for Libraries. Webology 3(2) 2006.
  • Bellotti et al. (2003) Integrating tools and tasks: Taking email to task: the design and evaluation of a task management centered email tool, Proc. CHI 2003
  • Gruhl et al. (2004) Information diffusion through blogspace, Proc. WWW 2004.
  • Sen et al. (2006) tagging, communities, vocabulary, evolution. Best paper at CHI 2006.
  • *Kleinberg (2003) Bursty and Hierarchical Structure in Streams, Data Mining and Knowledge Discovery, 7(4)
  • *Jackson et al. (2003) Understanding email interaction increases organizational productivity, CACM
  • *Campbell et al. (2003) Expertise identification using email communications, Proc. CIKM 2003.
  • *Golder and Huberman (2006) Usage patterns of collaborative tagging systems. Journal of Information Science, 32(2) 198-203. .
  • Homework #2 due in IVLE workbin by 7 Nov 11:59 pm SGT
Week 12:
(14 Nov)
No Class Poster presentations in lieu of class. See below.
Reading Week (Fri 12 Nov - Thu 24 Nov 2006)
Mon 20 Nov 6-9 pm Project Presentations
Poster presentations in TR 4, SR 1, TR 5. You will presenting your poster to me during the SoC Graduate Course Project Poster Session. See http://www.comp.nus.edu.sg/~kanmy/courses/poster_session_sem1_2006/.
Tue 21 Nov 6:30-8:30 pm @ SR4 (SoC 1 06-12) Course Revision
  • Homework #2 returned
Final Exam (Tue 28 Nov, evening [SR1 S16 3/F, 7:30-9:30pm])

Min-Yen Kan <kanmy@comp.nus.edu.sg> Created on: Mon Dec 1 19:36:22 2003 | RCS: $Id: syllabus.html,v 1.2 2004/08/11 06:00:38 kanmy Exp kanmy $ | Version: 1.0 | Last modified: Mon Nov 27 18:27:22 2006