School of Computing
(Image courtesy kwl @ Flickr)

Digital Libraries

NUS SoC, 2015/2016, Semester I, Discussion Room 6 (COM1 02-12) / Mondays 12:00-14:00

Last Updated: Sunday, August 9, 2015 02:09:22 AM SGT . Preliminary project description ported from previous years. Subject to major revisions soon.

Project

Projects will be done individually. A good research project must (i) define a problem (ii) propose a solution (iii) implement the solution (simulated or real) and (iv) evaluate againsts any applicable existing solutions or related work. For this course, the project must obviously be related to the course subject, but with an area as broad as digital libraries, you should be able to find a topic that matches your interests, and/or partially overlap with your research work.

Your research project can take one of the following manifestations:

  • New research problem/solution - You define a new, interesting problem and propose a solution. Your solution does not have to set a high performance standard for the task, since you are pioneering a new area of research.
  • Existing research problem/new solution - You look at an existing, interesting problem, and propose a new, novel solution that is better than existing solutions, which can lead to new ways of looking/understanding the problem. Your solution doesn't have to outperform existing methods in all categories but at least in some particular domain. For example, we are concerned with digital libraries in this course. It will suffice if your solution for typical documents in digital libraries is statistically significantly better than in the more general case.
  • Existing research problem/compare existing solutions - You look at an existing problem and its solutions. Implement the solutions, compare them and provide new insights to why one solution is better than another. Provide public-domain software for letting others share and use your work. Continuing an Elsevier/NUS SgCodeJam24 submission is encouraged and is specifically sanctioned for this course.
  • Build an innovative system - Build a novel application that no one, or few, have built before. But most importantly, identify new issues in your system that no existing solutions can adequately solve.
  • Empirical analysis of some collected data - Researchers often need to build systems that actually solve or improve on real problems. Papers that analyze the usability of systems or characterize the data in some way assist others to understand the problem or the clientele (our users) for a particular problem.

Choosing a project

Below you will find a list of possible final projects. As this is a seminar, research course, you will be primarily assessed on the work you do on the final project. As such I expect and demand that each student/team of students achieve some novel research development or finding that is not a rehashing of the existing literature. The midterm survey paper is intended to foster this understanding and encourage you to poke into new territories.

You are welcomed and encouraged to propose alternate projects. Your topic should blend together your strengths from your background, experience and current coursework, yet be applicable to digital libraries research. I have listed some ideas for projects in certain areas. Teams that have taken projects that interest them and/or have relevance to their research or jobs seem to always do best. Some of the possible projects include (but are not limited to):

  • Social Network Analysis
  • Building a better citation parser
  • Web hyperlink classification
  • Exploring the relationships between prestige, authorities and hubs
  • Centrality and density of different genres of websites
  • Automatic computation of an area's journal and conference reputations
  • Access and Usability Issues
  • Multi-object summarization
  • The use of VR and immersive environments in the DL
  • Efficient social network visualization
  • Critique of current approaches in crosswalking of metadata
  • Novel querying tools for E-mail, blogs, and IM
  • Organizing photo and video content
  • User modeling
  • Classifying browsing and searching strategies based on information trails
  • Differences in retrieval effectiveness in speech queries as opposed to text/typed queries
  • Conceptual Search / Polysemy and synonymy
  • Query expansion and restriction from user query logs
  • Characterizing known item queries
  • Automatic jargon and terminology canonicalization
  • Classification and Filtering
  • Automatic ACM classification for theses and technical reports
  • Home page interest networking
  • Automatic ODP categorization for web sites
  • Threading and summarizing blog, email or IM searches
  • Digital Library Creation
  • GIS: Integration of maps at different scales
  • Inferring useful metadata for genres of web documents
  • Dateline and timeline history collection and canonicalization
  • Digital Library Cataloging and Indexing
  • Multimedia Metadata Features
  • Digital Library Policy:
  • Exploring the integrity of skyreading/skyreading and its effect on scholarship.
  • Cost models for the digital library in specialized domains/forms of media
  • Convenience, user rights and usability of linkages in the digital library
  • Authorship Analysis
  • Styles and Genres for authorship identification in web pages
  • Linkage styles and classification for webpage creators
  • Linking SMS and chat log short forms to long forms

I have references some starting references for some of these topics. You may find it helpful to view past projects by previous students in earlier versions of this course run in Semester I of 2006/07, 2004/05, 2005/06 and 2003/04.

Project proposal, write-up, presentation and grading

Proposal: Here are slides on how to do your project proposal.

Write-Up: Part of the skills that you should practice in a project-based graduate class is how to report your work. Expert researchers will tell you that half (if not most) of your time on a project will involve polishing your paper so it is easy to read and straightforward. Generally, filling up the page limit is easy, but deciding what to omit and how to succinctly express your idea is difficult.

Your write-up will take the form of a research paper intended for a conference submission with a 10 page limit. You should use an ACM proceedings style (You can follow the instructions for WWW 2010, for example). You may supplement this with a reference to your project's website / blog (if one was created) and any amount of appendices that you feel will help determine a grade. Selected final projects will be asked to submit their work to a relevant conference or journal, such as the ones listed on the miscellaneous page of this site.

Your project report is due first, by 11:59:59pm, 31 Oct (Monday). Standard late penalties apply, so please turn them in on-time. You may optionally turn in the project a week earlier for extra credit from Min. This option is to allow Min to start grading projects early that thereturn project assessments can be done before the final exam.

Presentations: In the following week, we will meet in the evening of W13 for your project presentations. Presentations will run for ten minutes each with an additional five minutes for questions. The sign up procedure will be announced later. Only a single group representative needs to be present. If no one from your group can make the project presentation timing, please let me know in person or email.

Grading for the project's final report and presentation are likely to follow similar weights as ones used in the previous version of this course: for the presentation, for research projects and for implementation projects.

Final Workload Disclaimer

The project is the primary method in which you will be assessed for your course, and in its various forms, accounts for 60% of the marks for the class. The workload throughout the rest of the course is purposely light to ensure that you have enough time to produce high-quality research in the project. As such you need to budget your team's time wisely and ensure that you have appropriately scoped your project and covered the topic with enough detail and with appropriate evaluation. You should adhere and refine your proposed project schedule as included in the project proposal. Some students invariably start the project too late or mismanage their time and neglect such open-ended course assignments, in order to advance in classes that have more concrete assessment milestones. I warn you now to budget your time between classes wisely. As this is a four MC module, there are ten hours of time that a student should allot to this course. Eight of these are preparation time, and for this course the bulk of this time is intended for your project. Roughly speaking, you should invest about 9 weeks * 8 hours/week = 72 hours on your project.