CS 6210 - Special Topics in CS: Digital Libraries and
Computing for the Humanities
Orientation
Module 0                Min-Yen KAN

What is a library?
A place set apart to contain books for reading, study, or reference.
(Not applied, e.g. to the shop or warehouse of a bookseller.)
A building … containing a collection of books for the use of the public or of some particular portion of it, or of the members of some society or the like;
a public institution or establishment, charged with the care of a collection of books, and the duty of rendering the books accessible to those who require to use them.

What is a library?
A private commercial establishment for the lending of books, the borrower paying either a fixed sum for each book lent or a periodical subscription.
a great mass of learning or knowledge;
the objects of a person's study, the sources on which he depends for instruction.
Computers. An organized collection of routines, esp. of tested routines suitable for a particular model of computer
Biology. a collection of sequences of DNA …  that represent the genetic material of a particular organism or tissue

Introduction
Bush’s “As we may think”
Writes this at the end of WW II
_____ was the first computer, born to compute ballistic tables fast
______ just invented 5 years ago
________ (“display technology”) still a less than perfect process.
_______ (“storage technology”) was a mature and stable technology.

Vannevar Bush (1890-1974)
Director of the Office of Scientific Research and Development
lead 6000 scientists in R&D for WWII
Predicted many technological advances
the “memex” is one whose spirit we are implementing
the purpose was to provide scientists the capability to exchange information; to have access to the totality of recorded information

Design for Memex (c. 1945)

Memex
Integrated computer, keyboard, and desk
“mechanized private file and library”
remove drudgery from information retrieval
suggested implementation was microfilm
various user operations  are suggested
________________ was the main purpose
“the process of tying two items together is the important thing”
prelude to hypertext...

Memex
Information could come pre-associatively indexed, but the key point was ___ ___________
____ still does not provide that today
Bush observes that tools change our way of doing, and expand the horizons before us
full impact of WWW and DLs still not known

What is a Digital Library (DL)?
“a collection of information that is both digitized and organized” (Lesk)
there are numbers of alternate definitions, but this seems fair enough
no mention of ________, __________, __________, etc.
It is not just to reform the current library system, rather, we aim to
organize and access the “information overload”

Outline for today
Introduction to libraries √
Course administration
Reading and writing research
To think about

Course administration
Teaching staff
Web sites
Objective
Syllabus
Assessment overview
Homework and discussions
Survey paper and project
Any questions?

Teaching staff
Lecturer:
Min-Yen Kan (“Min”)
kanmy@comp.nus.
edu.sg
Office: S15 05-05
6875-1885
Hours:

Course web sites
http://ivle.nus.edu.sg/
Discussion forum
Any questions related to the course should be raised on this forum
Please do not send emails except urgent or personal matters
Announcements
Work bin: Lecture notes (incomplete!)
http://www.comp.nus.edu.sg/~cs6210
Homework specification
Other supplementary content

Objective
Building, using and maintaining large volumes of information
Contrast computational approaches with traditional library science methods
Who?
Advanced undergraduates and beginning graduate students. Centered towards IS/CS or by permission.

Syllabus
(S0. 6 Aug and S1. 13 Aug)
M0: Orientation; and M1: LIS crash course.
(S2. 20 Aug)
M2: Multi-(media, lingual, access, needs).
(S3. 27 Aug)
M3: Cataloging/indexing services.
(S4. 3 Sep)
M4: Metadata creation and management.
(S5. 10 Sep and S6. 17 Sep)
M5: Fundamentals of information retrieval.
(S7. 24 Sep)
M6: Introduction to bibliometrics.

Syllabus (2/2)
(S8. 1 Oct) Usability of OPACs and retrieval engines.
(S9. 8 Oct)
M8: Computational literary analysis.
(S10. 15 Oct)
M9: The problem of synonymy.
(S11. 22 Oct)
M10: Topics in digital library policy.
(S12. 29 Oct)
Final project poster presentations.

Readings
Required textbook:
Lesk (1999) Practical Digital Libraries
Will be supplemented by readings and excerpts from the following books:
Baeza-Yates and Ribeiro-Neto (1999) Modern Information Retrieval
Witten, Bell and Moffat (2003) Managing Gigabytes.
Chakrabhati (2003) Mining the Web.
Arms (2003) Digital Libraries.

Discussions
Class participation is very important. There are no “dumb” questions. You will only be penalized for “no” questions / comments.
Possibilities:
Name tags
Cold calls
Small group discussion and presentation

Freedom of information rule
Collaboration is acceptable
To assure that all collaboration is on the level, ________________________________________________________________________________________
You will be assessed for the parts for which you claim is your own contribution.

Gilligan’s Island rule
You are free to meet with fellow students(s) and discuss assignments with them.
Writing on a board or shared piece of paper is acceptable during the meeting; however, you ___________________________________________________________________.
After the meeting, do something else for at least a half-hour (watch an episode of Gilligan's Island), before working on the assignment.
This will assure that you are able to reconstruct what you learned from the meeting, by yourself, using your own brain.

Assessment overview
Homeworks (2) @ 10% =20%
Discussion participation 10%
Survey paper 20%
Final project
Presentation 10%
Write-up / Deliverables 40%

Homework and discussions
Homework
Practical aspect of the course
Two assessments, both individual:
HW 1: query analysis
HW 2: authorship detection
Discussions
Participation is key
Come prepared (read ahead of time!)

Literature survey
Each student will pick an area of study to survey 3-5 papers in detail.
Must be interesting to you
Journal or conference papers from an authority list
Limit to 8 pages, better if just 5
Individual work only
Give your perspective on area’s future

Final project
Students will self-organize into groups for the final projects, shortly after the survey papers are due.
Requires original work
Cooperation and coordination
Report as a conference submission
Poster presentation to the public
Sample topics on the web page

Outline for today
Introduction to libraries √
Course administration √
Reading and writing research
To think about

Reading and writing research papers
References:
 http://www.cse.ogi.edu/~dylan/
efficientReading.html
 ftp://fast.cs.utah.edu/pub/writing-papers.ps
This section partially from Surendar Chandra
of University of Notre Dame.

Why do you read a paper?
Understand and learn new contributions
However…
Not all papers are “good”
Not all papers are “interesting”
Not all papers are “worthwhile” for you
You have to learn to identify a good paper and spend your time wisely
Breadth
Depth
React

Reading a research paper
What is this paper about?
Read the title and the abstract
If you still don’t know what this paper is about, then this is a poorly-written paper.
 Read the conclusion
Are you now sure you know what this paper is about? If not, paper.
Read the _________
Read the ___________________
Read _________________________

How to read a paper
See who wrote it, where it was published, when was it written (credibility)
Skim references
Are authors are aware of relevant related work?
Do you know the work that they cite?
Do you know other work that they should have cited?

How to read a paper - depth
Approach with scientific skepticism
Examine the assumptions.  Are they:
Rely on any uncertain trends?
Reasonable?
e.g., “Let’s assume that there are billions of powerful computers, connected by a high speed network, spread across the world, our system will …”
e.g., “Our system functions in real-time on a 33Mhz Intel 386 with 640K main memory running Windows 98”

How to read a paper - depth
Examine the methods:
Did they measure what they claim?
Can they explain what they observed?
Want an analysis of why the system behaves a certain way, not raw data.
Did they have adequate controls?
Were tests carried out in a standard way? Were the performance metrics standard?
If not, do they explain their metrics clearly?

How to read a paper - depth
Examine the statistics:
“Lies, d*mned lies and statistics”
Appropriate statistical tests applied properly?
Did they do proper error analysis?
Are the results statistically significant?
Common mistake: “We performed our experiment once at 4 am and noticed a ten fold improvement. Thus we conclude that our system is better”
Be very careful with percentages
Method A: 0.01 seconds, our Method: 0.005 seconds
Our method shows 100% improvement over method A!!

How to read a paper - depth
Examine the conclusions:
Do the conclusions follow logically from the conclusions
We performed our experiments with 8 palm pilots and saw a 10 fold improvement. Hence we conclude that our system will scale to millions of palm pilots
What other explanations are there for the observed effects
What other conclusions or correlations are there in the data that they did not point out
Earlier work performed experiments using a 2 Mbit wireless network. Our system (incidentally) used a 11 Mbit network and saw a 5 fold improvement. So our technique works!!

How to read a paper - react
Take notes
Highlight major points
React to the points in the paper
Place this work with your own experience
If you doubt a statement, note your objection
Summarize what you read
Good practice: maintain your own bibliography of all papers that you ever read

How to write a research paper
Write it such that anyone who reads it using the method we just discussed understands the idea
Clearly explain what problem you are solving, why it is interesting and how your solution solves this problem
Be crisp. Explain what your contributions are, what your ideas are and what are others’ ideas

Any questions?
Introduction to libraries √
Course administration √
Reading and writing research √

Survey on expectations
Go to IVLE and help me determine the needs for this course
Your background and knowledge
Expectations on what you want to learn
Optimal office hours
Please complete before next lecture!

To think about for discussion
What are the functions of a traditional library?
Are these same functions in the digital library?
How is the digital library different from:
_________?
_________?