CS 5244 - Digital Libraries

NUS SoC, 2004/2005, Semester I
LT 34, Tuesdays 18:30-20:30


Menu

[ IVLE ]

[ Overview ]
[ Syllabus ]
[ Grading ]
[ Project ]
[ >Survey ]
[ Misc. ]

(Last updated on: Mon Aug 30 15:39:33 GMT-8 2004 )

The survey paper examines a particular aspect of digital libraries or their applications. For the survey, you will have to read at least four papers of high quality and write a paper that not only summarizes the papers' contributions but also clearly differentiates each papers' strengths and weaknesses. Note: you may quote from your sources but you must cite what you quote. Failure to do so constitutes plagiarism, as outlined on the Grading page. The survey paper is currently due on Friday 17 September 2004 at 11:59:59 pm, as mentioned on the syllabus page. Please see the survey on IVLE to express your preferences about the due date.

Here the topics for the survey paper and some suggested readings for the survey. Please note that as some of the topics below are very broad, you may have to choose only a subset of the suggested readings to build your survey paper around. Many of the readings that I have suggested come from recent conferences, so it will require you to read background work. Remember that the four paper requirements is a minimum; you may have to read many more than four to get a coherent overview of the topic.

  • Automated Collection Building - Ho Van Phong
    Also find papers in : WWW
    • JCDL 2002 Collection synthesis Donna Bergmark Pages: 253 - 262
    • G. Pant, K. Tsioutsiouliklis, J. Johnson, C.L. Giles: Panorama: Extending Digital Libraries with Topical Crawlers. Proc. ACM/IEEE Joint Conference on Digital Libraries (JCDL 2004).

  • *Bioinformation and Genomic Data in DLs - Steven Halim, Wang Jiren, Chen Ding
    Papers from BIOLINK 2004: http://www.cs.brandeis.edu/~jamesp/biolink2004/schedule.html. And papers from ACL 2003 Workshop on Biomedicine http://www-tsujii.is.s.u-tokyo.ac.jp/ACL03/bio_program.html
    • Zoi Lacroix, Omar Boucelma, Mehdi Essid: The biological integration system. 45-49, WIDM 2003 http://doi.acm.org/10.1145/956699.956709
    • S. B. Davidson and et al. Biokleisli:a digital library for biomedical researchers. Intnl. J. on Digital Libraries, 1(1):36
    • Erjavec, T., Kim, J.D., Ohta, T., Tateisi, Y. & Tsujii, J. " Encoding Biomedical Resources in TEI: The Case of the GENIA Corpus" NLP in Biomedicine ACL 2003 Workshop Program
    • Finding Gene Names Using FlyBase www.cs.brandeis.edu/~jamesp/ biolink2004/papers/pdf/BIO004.pdf

  • Correction and Analysis of User Queries or Documents - Zhang Li
    Document correction:
    • Bibliographic attribute extraction from erroneous references based on a statistical model Atsuhiro Takasu, JCDL 2003
    • SIGIR 2002. Seung-Taek Park, David M. Pennock, C. Lee Giles, Robert Krovetz Analysis of lexical signatures for finding lost or related documents

  • *Digital Library Social Policy - Gary Lim, Noel Ong, Paulynn Ong
    Digital Divide:
    • Hoffman. The Evolution of the Digital Divide: How Gaps in Internet Access May Impact Electronic Commerce http://www.ascusc.org/jcmc/vol5/issue3/hoffman.html
    • Bridging the Digital Divide: The Story of the Free Internet ... http://csdl.computer.org/comp/proceedings/hicss/2003/1874/05/187450140b.pdf.
    • Home Internet Use in Low-Income Families: Is Access Enough to Eliminate the Digital Divide? / Linda A. Jackson, Gretchen Barbatsis, Frank A. Biocca, Alexander von Eye, Yong Zhao, Hiram E. Fitzgerald In Media access : social and psychological dimensions of new technology use / edited by Erik P. Bucy, John E. Newhagen.

    Information Ecology
    • Nardi, Bonnie A. (1999) Librarians: A Keystone Species, In Information Ecologies, MIT Press. On Reserve in the RBR.
    • Adams and Blanford. The developing roles of digital library intermediaries www.uclic.ucl.ac.uk/annb/DLUsability/Intermediaries04AaAb.pdf

    Preservation:
    • Sully, Sarah E. "JSTOR: An IP Practitioner's Perspective." D-Lib Magazine.January, 1997. Online. Available: http://www.dlib.org/dlib/january97/01sully.html.
    • Michael A. Keller, Vicky Reich, and Andrew Herkovic, "What is a library anymore anyway?", First Monday http://www.firstmonday.org/issues/issue8_5/keller/index.html Volume 8, Number 5, 2003.
    • Vicky Reich & David S. H. Rosenthal, D-Lib Magazine, June 2001 Volume 7 Number 6. http://www.dlib.org/dlib/june01/reich/06reich.html
    • Fresko, M. (1995) Long Term Preservation of Electronic Materials (1995). A Report of a Workshop Organised by JISC/British Library, held at the University of Warwick on 27-28 November 1995. British Library R & D Report 6238. http://www.ukoln.ac.uk/services/papers/bl/rdr6238/paper.html
    • Press Release on the Internet Archive http://www.archive.org/about/press_release.php

  • Examples of Domain-Specific DLs

  • Intelligent Agents in DLs

  • Interoperability between DLs

  • Metadata Extraction and Indexing - Chia Hoo Hon, Zhang Xia
    A study of manual methods:
    • Catherine C. Marshall, Making metadata: a study of metadata creation for a mixed physical-digital collection, Proceedings of the third ACM conference on Digital libraries, p.162-171, June 23-26, 1998, Pittsburgh, Pennsylvania, United States http://doi.acm.org/10.1145/276675.276693

    Text:
    • Nina Wacholder, David K. Evans, Judith Klavans: Automatic identification and organization of index terms for interactive browsing. JCDL 2001: 126-134
    • Hui Han, C. Lee Giles, Eren Manavoglu, Hongyuan Zha, Zhenyue Zhang, Edward A. Fox Automatic document metadata extraction using support vector machines

    Images:
    • Generating fuzzy semantic metadata describing spatial relations from images using the R-histogram http://doi.acm.org/10.1145/996350.996396, International Conference on Digital Libraries archive 2004 Proceedings of the 2004 joint ACM/IEEE conference on Digital libraries

  • Metadata Harvesting and Metasearching

  • Mobile platform DL usability - Kwan Weng Wah, Fan Peck Ling
    • Catherine C. Marshall, Christine Ruotolo: Reading-in-the-small: a study of reading on small form factor devices. 56-64 Electronic Edition (DOI: 10.1145/544220.544230)
    • Seeing the Whole in Parts: Text Summarization for Web Browsing on Handheld Devices http://citeseer.ist.psu.edu/buyukkokten00seeing.html
    • Jones, M., Buchanan, G., Thimbleby, H., Sorting out Searching on Small Screen Devices, Conference on Mobile HCI http://citeseer.ist.psu.edu/jones02sorting.html

  • Multilingual Text Segmentation - Low Jin Kiat
    ACL, Coling are good venues for this
    • Unsupervised Learning of Arabic Stemming Using a Parallel Corpus Monica Rogati, Scott McCarley and Yiming Yang, ACL 2003
    • Qiang Zhou: Local context templates for Chinese constituent boundary prediction. 975-981 Electronic Edition http://acl.ldc.upenn.edu/C/C00/C00-2141.pdf COLING 2000
    • Gary Kacmarcik, Chris Brockett, Hisami Suzuki: Robust Segmentation of Japanese Text into a Lattice for Parsing. 390-396 http://acl.ldc.upenn.edu/C/C00/C00-2141.pdf COLING 2000

  • Music in DLs - Melvin Yap, Tan Hoon Hoon
    Also find papers in: ISMIR, ACM MM
    Categorization:
    • G. Tzanetakis and P. Cook, "Musical genre classification of audio signals," IEEE Transactions on Speech and Audio Processing, 10(5), 2002.

    Querying:
    • Looking for new, not known music only: music retrieval by melody style Fang-Fei Kuo, Man-Kwan Shan Pages: 243 - 251, Query

    Indexing:
    • portal.acm.org/citation.cfm?id=827140.827143 Content-Based Indexing of Musical Scores

  • New Media for DL Blogging, IM, Wiki - Chong Kian Ming
    If you're doing this topic, you should choose one of the one new types of media:
    Blogging:
    • Blogging by the Rest of UsDiane Schiano, Bonnie Nardi, Michelle Gumbrecht and Luke Swartz CHI 2004
    • BlogPulse: Automated Trend Discovery for Weblogs Natalie S. Glance, Matthew Hurst and Takashi Tomokiyo Intelliseek Applied Research Center
    • On the bursty evolution of blogspace Ravi Kumar, Jasmine Novak, Prabhakar Raghavan and Andrew Tomkins WWW 2003

    Instant Messaging:
    • Grinter, Rebecca and Leysia Palen (2002). Instant Messaging in Teen Life. In Proceedings of the 2002 ACM Conference on Computer Supported Cooperative Work (CSCW '02), New Orleans, Louisiana.
    • Issacs, E., Walendowski, A., Whittaker, S., Schiano, D. and Kamm, C. (2002) The Character, Functions, and Styles of Instant Messaging in the Workplace. Proc. CSCW 2002. ACM Press (2002), 11-20.

  • Patterns of use in the DL / Web - Jon Tan
    • Diane Kelly, Colleen Cool: The effects of topic familiarity on information search behavior. 74-75 Electronic Edition (DOI: 10.1145/544220.544232), JCDL
    • Sharing encountered information: digital libraries get a social life http://doi.acm.org/10.1145/996350.996401, JCDL

    Query Analysis:
    • Steve Cronen-Townsend, Yun Zhou, W. Bruce Croft Predicting query performance Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval http://doi.acm.org/10.1145/564376.564429
    • Understanding user goals in web search Daniel E. Rose, Danny Levinson Pages: 13 - 19, WWW 2004
    • An ethnographic study of technical support workers: why we didn't build a tech support digital library Sally Jo Cunningham, Chris Knowles, Nina Reeves Pages: 189 - 198

  • Phrasal Searching Techniques - Eileen Khoo
    • Using Common Hypertext Links to Identify the Best Phrasal Description of Target Web Documents Einat Amitay SIGIR '98
    • Efficient phrase querying with an auxiliary index Annual ACM Conference on Research and Development in Information Retrieval archive Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval http://doi.acm.org/10.1145/564376.564415
    • An evaluation of phrasal and clustered representations on a text categorization task David D Lewis http://doi.acm.org/10.1145/133160.133172

  • *Question answering systems - Zhan Jiaming, Jiang Zheng Ping, Chen Chao
    A great list of reads on Bos and Webber's reading group (you have it a bit easier as someone has done some selection for you): http://www.iccs.informatics.ed.ac.uk/~jbos/qa/
    Also find papers in SIGIR
    • Lynette Hirschman and Rob Gaizauskas (2001): Natural Language Question Answering: The View from Here. Natural Language Engineering 7
    • Harabagiu Moldovan, et al., (2001): FALCON: Boosting Knowledge for Answer Engines. In: Proceedings of The Ninth Text REtrieval Conference (TREC 9).
    • Structured use of external knowledge for event-based open domain question answering Hui Yang, Tat-Seng Chua, Shuguang Wang, Chun-Keat Koh Pages: 33 - 40

  • *Recommender Systems - Wong Kok Hoong, Li Qiang
    Also find papers in SIGIR, WWW, Machine Learning
    • Content-based filtering & collaborative filtering: An automatic weighting scheme for collaborative filtering Rong Jin, Joyce Y. Chai, Luo Si July 2004 Proceedings of the 27th annual international conference on Research and development in information retrieval
    • Proceedings of the 13th international conference on World Wide Web Reputation networks: Shilling recommender systems for fun and profit Shyong K. Lam, John Riedl

  • Speech in DLs

  • Spatial and Geographic data in DLs

  • Standards used in the DL Metadata and Markup - Cheng Weiwei
    There are too many metadata formats to overview successfully in a survey paper. You should concentrate on one or two that have a similar purpose and pursue these in depth.

    EAD:

    • Using the Open Archives Initiative Protocols with EAD Christopher J. Prom, Thomas G. Habing, JCDL 2002
    • EAD Development, http://www.loc.gov/ead/eaddev.html

    Dublin Core

    • A Quantitative Analysis of Unqualified Dublin Core Metadata Element Set Usage within Data Providers Registered with the Open Archives Initiative Jewel Ward, University of North Carolina at Chapel Hill, JCDL 2003 portal.acm.org/ft_gateway.cfm?id=827196&type=pdf
    • The Dublin Core and Warwick Framework A Review of the Literature, March 1995 - September 1997, D-Lib Magazine, January 1998.

  • Temporal data in DLs

  • Text classification for DLs - He Cong, Feng Chun
    • Using asymmetric distributions to improve text classifier probability estimates Paul N. Bennett, SIGIR 2003 http://doi.acm.org/10.1145/860435.860457 Pages: 111 - 118
    • Text categorization by boosting automatically extracted concepts Lijuan Cai, Thomas Hofmann Pages: 182 - 189 http://doi.acm.org/10.1145/860435.860470, SIGIR 2003
    • Web-page classification through summarization Dou Shen, Zheng Chen, Qiang Yang, Hua-Jun Zeng, Benyu Zhang, Yuchang Lu, Wei-Ying Ma Pages: 242 - 249 SIGIR 2004
    • Text genre classification with genre-revealing and subject-revealing features Yong-Bae Lee, Sung Hyon Myaeng, SIGIR 2002 Pages: 145 - 150 http://doi.acm.org/10.1145/564376.564403

  • Tools to build a DL

  • *User interfaces in DLs - Lee Chie Ping, Lee Sue Yin, Woo Wei Leng
    Others from Visualization Workshop in JCDL 2001
    • IdeaKeeper notepads: scaffolding digital library information analysis in online inquiry Conference on Human Factors in Computing Systems Extended abstracts of the 2004 conference on Human factors and computing systems http://doi.acm.org/10.1145/985921.986056

    Visualization of Scientific Research:
    • Katy Borner and Shashikant Penumarthy. Social Diffusion Patterns in Three-Dimensional Virtual Worlds. Information Visualization journal, vol. 2, no 3, pp. 182-198, 2003.

    Multimodal, media specific:
    • E.-P. Lim, D. H.-L. Goh, Z. Liu, W.-K. Ng, C. S.-G. Khoo, S. E. Higgins, G-Portal: A Map-based Digital Library for Distributed Geospatial and Georeferenced Resources, in: Proceedings of the Second ACM+IEEE Joint Conference on Digital Libraries (JCDL 2002
    • Mor Naaman, Yee Jiun Song, Andreas Paepcke, and Hector Garcia-Molina Automatic organization for digital photographs with geographic coordinates. JCDL 2004 http://doi.acm.org/10.1145/996350.996366
    • Detecting and Browsing Events in Unstructured text David A. Smith, pg 73-80

    Reading:
    • Yi-Chun Chu, David Bainbridge, Matt Jones, and Ian H. Witten Realistic books: A bizarre homage to an obsolete medium?. JCDL 2004

    General, Legacy:
    • B. Shneiderman, D. Feldman, A. Rose, and X.F. Grau, "Visualizing Digital Library Search Results with Categorical and Hierarchical Axes," in Proceedings of 5th ACM Digital Library Conference, 1999, ACM, pp. 57-65.

  • Video in DLs - Neo Shi Yong
    • Alan F. Smeaton, Indexing, browsing, and searching of digital video and digtial audio information, Lectures on information retrieval, Springer-Verlag New York, Inc., New York, NY, 2001
    • Video retrieval using an MPEG-7 based inference network Andrew Graves, Mounia Lalmas Pages: 339 - 346 http://doi.acm.org/10.1145/564376.564436
    • The VISION Digital Video Library , Susan Gauch, Wei Li and John Gauch, Information Processing & Management, Vol. 33, No. 4, April 1997, pp. 413-426.

* - Denotes a closed survey paper topic. Students who haven't selected or need to reselect cannot choose these topics. You may want to choose areas in which other students are also doing a survey so that you may have some joint expertise if you're interested in doing a group project.

Requirements: Your survey paper should be no longer than six single column, single spaced pages. The more concise you are at summarizing the points, the more likely that you'll receive a higher grade for the class.

Once you've chosen an area for your survey paper, I will help suggest two to three references that you can start with. Your responsibility is then to decide whether to accept my suggested papers and to supplement/replace the papers to round out your survey.

You can view the past grading criteria for this milestone. While this is no guarantee of how this semester's survey papers will be graded, they will be graded on similar criteria.

Min-Yen Kan <kanmy@comp.nus.edu.sg> Created on: Mon Dec 1 19:36:22 2003 | Version: 1.0 | Last modified: Fri Oct 22 11:24:20 2004