CS 6210 - Special Topics
in CS: Digital Libraries and
Computing for the Humanities
|
|
|
Orientation |
|
|
|
Module 0 Min-Yen KAN |
What is a library?
|
|
|
|
A place set apart to contain books for
reading, study, or reference. |
|
(Not applied, e.g. to the shop or
warehouse of a bookseller.) |
|
A building … containing a collection of
books for the use of the public or of some particular portion of it, or of
the members of some society or the like; |
|
a public institution or establishment,
charged with the care of a collection of books, and the duty of rendering the
books accessible to those who require to use them. |
What is a library?
|
|
|
A private commercial establishment for
the lending of books, the borrower paying either a fixed sum for each book
lent or a periodical subscription. |
|
a great mass of learning or knowledge; |
|
the objects of a person's study, the
sources on which he depends for instruction. |
|
Computers. An organized collection of
routines, esp. of tested routines suitable for a particular model of computer |
|
Biology. a collection of sequences of
DNA … that represent the genetic
material of a particular organism or tissue |
|
|
Introduction
|
|
|
|
Bush’s “As we may think” |
|
|
|
Writes this at the end of WW II |
|
_____ was the first computer, born to
compute ballistic tables fast |
|
______ just invented 5 years ago |
|
________ (“display technology”) still a
less than perfect process. |
|
_______ (“storage technology”) was a
mature and stable technology. |
Vannevar Bush (1890-1974)
|
|
|
|
Director of the Office of Scientific
Research and Development |
|
lead 6000 scientists in R&D for
WWII |
|
Predicted many technological advances |
|
the “memex” is one whose spirit we are
implementing |
|
the purpose was to provide scientists
the capability to exchange information; to have access to the totality of
recorded information |
Design for Memex (c.
1945)
Memex
|
|
|
|
Integrated computer, keyboard, and desk |
|
“mechanized private file and library” |
|
remove drudgery from information
retrieval |
|
suggested implementation was microfilm |
|
various user operations are suggested |
|
________________ was the main purpose |
|
“the process of tying two items
together is the important thing” |
|
prelude to hypertext... |
Memex
|
|
|
|
Information could come
pre-associatively indexed, but the key point was ___ ___________ |
|
____ still does not provide that today |
|
Bush observes that tools change our way
of doing, and expand the horizons before us |
|
full impact of WWW and DLs still not
known |
|
|
What is a Digital Library
(DL)?
|
|
|
|
“a collection of information that is
both digitized and organized” (Lesk) |
|
there are numbers of alternate
definitions, but this seems fair enough |
|
no mention of ________, __________,
__________, etc. |
|
|
|
It is not just to reform the current
library system, rather, we aim to |
|
organize and access the “information
overload” |
Outline for today
|
|
|
Introduction to libraries √ |
|
Course administration |
|
Reading and writing research |
|
To think about |
Course administration
|
|
|
Teaching staff |
|
Web sites |
|
Objective |
|
Syllabus |
|
Assessment overview |
|
Homework and discussions |
|
Survey paper and project |
|
|
|
Any questions? |
Teaching staff
|
|
|
|
Lecturer: |
|
Min-Yen Kan (“Min”) |
|
kanmy@comp.nus.
edu.sg |
|
Office: S15 05-05 |
|
6875-1885 |
|
Hours: |
Course web sites
|
|
|
|
|
http://ivle.nus.edu.sg/ |
|
Discussion forum |
|
Any questions related to the course
should be raised on this forum |
|
Please do not send emails except urgent
or personal matters |
|
Announcements |
|
Work bin: Lecture notes (incomplete!) |
|
|
|
http://www.comp.nus.edu.sg/~cs6210 |
|
Homework specification |
|
Other supplementary content |
Objective
|
|
|
|
Building, using and maintaining large
volumes of information |
|
Contrast computational approaches with
traditional library science methods |
|
|
|
Who? |
|
Advanced undergraduates and beginning
graduate students. Centered towards IS/CS or by permission. |
Syllabus
|
|
|
(S0. 6 Aug and S1. 13 Aug)
M0: Orientation; and M1: LIS crash course. |
|
(S2. 20 Aug)
M2: Multi-(media, lingual, access, needs). |
|
(S3. 27 Aug)
M3: Cataloging/indexing services. |
|
(S4. 3 Sep)
M4: Metadata creation and management. |
|
(S5. 10 Sep and S6. 17 Sep)
M5: Fundamentals of information retrieval. |
|
(S7. 24 Sep)
M6: Introduction to bibliometrics. |
Syllabus (2/2)
|
|
|
(S8. 1 Oct) Usability of OPACs and
retrieval engines. |
|
(S9. 8 Oct)
M8: Computational literary analysis. |
|
(S10. 15 Oct)
M9: The problem of synonymy. |
|
(S11. 22 Oct)
M10: Topics in digital library policy. |
|
(S12. 29 Oct)
Final project poster presentations. |
Readings
|
|
|
|
Required textbook: |
|
Lesk (1999) Practical Digital Libraries |
|
|
|
Will be supplemented by readings and
excerpts from the following books: |
|
|
|
Baeza-Yates and Ribeiro-Neto (1999) Modern
Information Retrieval |
|
Witten, Bell and Moffat (2003) Managing
Gigabytes. |
|
Chakrabhati (2003) Mining the Web. |
|
Arms (2003) Digital Libraries. |
Discussions
|
|
|
Class participation is very important.
There are no “dumb” questions. You will only be penalized for “no” questions
/ comments. |
|
|
|
Possibilities: |
|
Name tags |
|
Cold calls |
|
Small group discussion and presentation |
Freedom of information
rule
|
|
|
Collaboration is acceptable |
|
|
|
To assure that all collaboration is on
the level, ________________________________________________________________________________________ |
|
|
|
You will be assessed for the parts for
which you claim is your own contribution. |
Gilligan’s Island rule
|
|
|
|
You are free to meet with fellow
students(s) and discuss assignments with them. |
|
|
|
Writing on a board or shared piece of
paper is acceptable during the meeting; however, you ___________________________________________________________________. |
|
|
|
After the meeting, do something else
for at least a half-hour (watch an episode of Gilligan's Island), before
working on the assignment. |
|
This will assure that you are able to
reconstruct what you learned from the meeting, by yourself, using your own
brain. |
Assessment overview
|
|
|
|
Homeworks (2) @ 10% =20% |
|
Discussion participation 10% |
|
Survey paper 20% |
|
Final project |
|
Presentation 10% |
|
Write-up / Deliverables 40% |
|
|
Homework and discussions
|
|
|
|
|
Homework |
|
Practical aspect of the course |
|
Two assessments, both individual: |
|
HW 1: query analysis |
|
HW 2: authorship detection |
|
|
|
Discussions |
|
Participation is key |
|
Come prepared (read ahead of time!) |
Literature survey
|
|
|
Each student will pick an area of study
to survey 3-5 papers in detail. |
|
|
|
Must be interesting to you |
|
Journal or conference papers from an
authority list |
|
Limit to 8 pages, better if just 5 |
|
Individual work only |
|
Give your perspective on area’s future |
Final project
|
|
|
Students will self-organize into groups
for the final projects, shortly after the survey papers are due. |
|
|
|
Requires original work |
|
Cooperation and coordination |
|
Report as a conference submission |
|
Poster presentation to the public |
|
Sample topics on the web page |
Outline for today
|
|
|
Introduction to libraries √ |
|
Course administration √ |
|
Reading and writing research |
|
To think about |
Reading and writing
research papers
|
|
|
References: |
|
|
|
http://www.cse.ogi.edu/~dylan/
efficientReading.html |
|
|
|
ftp://fast.cs.utah.edu/pub/writing-papers.ps |
|
|
|
This section partially from Surendar
Chandra
of University of Notre Dame. |
|
|
Why do you read a paper?
|
|
|
|
Understand and learn new contributions |
|
|
|
However… |
|
Not all papers are “good” |
|
Not all papers are “interesting” |
|
Not all papers are “worthwhile” for you |
|
|
|
You have to learn to identify a good
paper and spend your time wisely |
|
Breadth |
|
Depth |
|
React |
Reading a research paper
|
|
|
|
|
What is this paper about? |
|
Read the title and the abstract |
|
If you still don’t know what this paper
is about, then this is a poorly-written paper. |
|
Read the conclusion |
|
Are you now sure you know what this
paper is about? If not, paper. |
|
|
|
Read the _________ |
|
Read the ___________________ |
|
Read _________________________ |
How to read a paper
|
|
|
|
See who wrote it, where it was
published, when was it written (credibility) |
|
Skim references |
|
Are authors are aware of relevant
related work? |
|
Do you know the work that they cite? |
|
Do you know other work that they should
have cited? |
How to read a paper -
depth
|
|
|
|
|
Approach with scientific skepticism |
|
Examine the assumptions. Are they: |
|
Rely on any uncertain trends? |
|
Reasonable? |
|
e.g., “Let’s assume that there are
billions of powerful computers, connected by a high speed network, spread
across the world, our system will …” |
|
e.g., “Our system functions in
real-time on a 33Mhz Intel 386 with 640K main memory running Windows 98” |
How to read a paper -
depth
|
|
|
|
|
Examine the methods: |
|
Did they measure what they claim? |
|
|
|
Can they explain what they observed? |
|
Want an analysis of why the system
behaves a certain way, not raw data. |
|
|
|
Did they have adequate controls? |
|
|
|
Were tests carried out in a standard
way? Were the performance metrics standard? |
|
If not, do they explain their metrics
clearly? |
How to read a paper -
depth
|
|
|
|
|
Examine the statistics:
“Lies, d*mned lies and statistics” |
|
Appropriate statistical tests applied
properly? |
|
Did they do proper error analysis? |
|
Are the results statistically
significant? |
|
Common mistake: “We performed our
experiment once at 4 am and noticed a ten fold improvement. Thus we conclude
that our system is better” |
|
Be very careful with percentages |
|
Method A: 0.01 seconds, our Method:
0.005 seconds |
|
Our method shows 100% improvement over
method A!! |
How to read a paper -
depth
|
|
|
|
|
Examine the conclusions: |
|
Do the conclusions follow logically
from the conclusions |
|
We performed our experiments with 8
palm pilots and saw a 10 fold improvement. Hence we conclude that our system
will scale to millions of palm pilots |
|
|
|
What other explanations are there for
the observed effects |
|
What other conclusions or correlations
are there in the data that they did not point out |
|
Earlier work performed experiments
using a 2 Mbit wireless network. Our system (incidentally) used a 11 Mbit
network and saw a 5 fold improvement. So our technique works!! |
How to read a paper -
react
|
|
|
|
Take notes |
|
Highlight major points |
|
React to the points in the paper |
|
Place this work with your own
experience |
|
If you doubt a statement, note your
objection |
|
|
|
Summarize what you read |
|
Good practice: maintain your own
bibliography of all papers that you ever read |
How to write a research
paper
|
|
|
Write it such that anyone who reads it
using the method we just discussed understands the idea |
|
|
|
Clearly explain what problem you are
solving, why it is interesting and how your solution solves this problem |
|
|
|
Be crisp. Explain what your
contributions are, what your ideas are and what are others’ ideas |
Any questions?
|
|
|
Introduction to libraries √ |
|
Course administration √ |
|
Reading and writing research √ |
Survey on expectations
|
|
|
|
Go to IVLE and help me determine the
needs for this course |
|
|
|
Your background and knowledge |
|
Expectations on what you want to learn |
|
Optimal office hours |
|
|
|
Please complete before next lecture! |
To think about for
discussion
|
|
|
|
What are the functions of a traditional
library? |
|
Are these same functions in the digital
library? |
|
How is the digital library different
from: |
|
_________? |
|
_________? |