Notes
Slide Show
Outline
1
CS 6210 - Special Topics in CS: Digital Libraries and
Computing for the Humanities
  • Orientation


  • Module 0                Min-Yen KAN
2
What is a library?
  • A place set apart to contain books for reading, study, or reference.
    • (Not applied, e.g. to the shop or warehouse of a bookseller.)
  • A building … containing a collection of books for the use of the public or of some particular portion of it, or of the members of some society or the like;
  • a public institution or establishment, charged with the care of a collection of books, and the duty of rendering the books accessible to those who require to use them.
3
What is a library?
  • A private commercial establishment for the lending of books, the borrower paying either a fixed sum for each book lent or a periodical subscription.
  • a great mass of learning or knowledge;
  • the objects of a person's study, the sources on which he depends for instruction.
  • Computers. An organized collection of routines, esp. of tested routines suitable for a particular model of computer
  • Biology. a collection of sequences of DNA …  that represent the genetic material of a particular organism or tissue


4
Introduction
  • Bush’s “As we may think”


    • Writes this at the end of WW II
    • _____ was the first computer, born to compute ballistic tables fast
    • ______ just invented 5 years ago
    • ________ (“display technology”) still a less than perfect process.
    • _______ (“storage technology”) was a mature and stable technology.
5
Vannevar Bush (1890-1974)
  • Director of the Office of Scientific Research and Development
    • lead 6000 scientists in R&D for WWII
  • Predicted many technological advances
    • the “memex” is one whose spirit we are implementing
    • the purpose was to provide scientists the capability to exchange information; to have access to the totality of recorded information
6
Design for Memex (c. 1945)
7
Memex
  • Integrated computer, keyboard, and desk
  • “mechanized private file and library”
    • remove drudgery from information retrieval
    • suggested implementation was microfilm
    • various user operations  are suggested
  • ________________ was the main purpose
    • “the process of tying two items together is the important thing”
    • prelude to hypertext...
8
Memex
  • Information could come pre-associatively indexed, but the key point was ___ ___________
    • ____ still does not provide that today
  • Bush observes that tools change our way of doing, and expand the horizons before us
    • full impact of WWW and DLs still not known

9
What is a Digital Library (DL)?
  • “a collection of information that is both digitized and organized” (Lesk)
    • there are numbers of alternate definitions, but this seems fair enough
    • no mention of ________, __________, __________, etc.


  • It is not just to reform the current library system, rather, we aim to
    • organize and access the “information overload”
10
Outline for today
  • Introduction to libraries √
  • Course administration
  • Reading and writing research
  • To think about
11
Course administration
  • Teaching staff
  • Web sites
  • Objective
  • Syllabus
  • Assessment overview
  • Homework and discussions
  • Survey paper and project


  • Any questions?
12
Teaching staff
  • Lecturer:
    • Min-Yen Kan (“Min”)
    • kanmy@comp.nus.
      edu.sg
    • Office: S15 05-05
    • 6875-1885
    • Hours:
13
Course web sites
  • http://ivle.nus.edu.sg/
    • Discussion forum
      • Any questions related to the course should be raised on this forum
      • Please do not send emails except urgent or personal matters
    • Announcements
    • Work bin: Lecture notes (incomplete!)


  • http://www.comp.nus.edu.sg/~cs6210
    • Homework specification
    • Other supplementary content
14
Objective
  • Building, using and maintaining large volumes of information
  • Contrast computational approaches with traditional library science methods


  • Who?
    • Advanced undergraduates and beginning graduate students. Centered towards IS/CS or by permission.
15
Syllabus
  • (S0. 6 Aug and S1. 13 Aug)
    M0: Orientation; and M1: LIS crash course.
  • (S2. 20 Aug)
    M2: Multi-(media, lingual, access, needs).
  • (S3. 27 Aug)
    M3: Cataloging/indexing services.
  • (S4. 3 Sep)
    M4: Metadata creation and management.
  • (S5. 10 Sep and S6. 17 Sep)
    M5: Fundamentals of information retrieval.
  • (S7. 24 Sep)
    M6: Introduction to bibliometrics.
16
Syllabus (2/2)
  • (S8. 1 Oct) Usability of OPACs and retrieval engines.
  • (S9. 8 Oct)
    M8: Computational literary analysis.
  • (S10. 15 Oct)
    M9: The problem of synonymy.
  • (S11. 22 Oct)
    M10: Topics in digital library policy.
  • (S12. 29 Oct)
    Final project poster presentations.
17
Readings
  • Required textbook:
    • Lesk (1999) Practical Digital Libraries

  • Will be supplemented by readings and excerpts from the following books:
    • Baeza-Yates and Ribeiro-Neto (1999) Modern Information Retrieval
    • Witten, Bell and Moffat (2003) Managing Gigabytes.
    • Chakrabhati (2003) Mining the Web.
    • Arms (2003) Digital Libraries.
18
Discussions
  • Class participation is very important. There are no “dumb” questions. You will only be penalized for “no” questions / comments.


  • Possibilities:
  • Name tags
  • Cold calls
  • Small group discussion and presentation
19
Freedom of information rule
  • Collaboration is acceptable


  • To assure that all collaboration is on the level, ________________________________________________________________________________________


  • You will be assessed for the parts for which you claim is your own contribution.
20
Gilligan’s Island rule
  • You are free to meet with fellow students(s) and discuss assignments with them.


  • Writing on a board or shared piece of paper is acceptable during the meeting; however, you ___________________________________________________________________.


  • After the meeting, do something else for at least a half-hour (watch an episode of Gilligan's Island), before working on the assignment.
    • This will assure that you are able to reconstruct what you learned from the meeting, by yourself, using your own brain.
21
Assessment overview
  • Homeworks (2) @ 10% =20%
  • Discussion participation 10%
  • Survey paper 20%
  • Final project
    • Presentation 10%
    • Write-up / Deliverables 40%

22
Homework and discussions
  • Homework
    • Practical aspect of the course
    • Two assessments, both individual:
      • HW 1: query analysis
      • HW 2: authorship detection

  • Discussions
    • Participation is key
    • Come prepared (read ahead of time!)
23
Literature survey
  • Each student will pick an area of study to survey 3-5 papers in detail.


  • Must be interesting to you
  • Journal or conference papers from an authority list
  • Limit to 8 pages, better if just 5
  • Individual work only
  • Give your perspective on area’s future
24
Final project
  • Students will self-organize into groups for the final projects, shortly after the survey papers are due.


  • Requires original work
  • Cooperation and coordination
  • Report as a conference submission
  • Poster presentation to the public
  • Sample topics on the web page
25
Outline for today
  • Introduction to libraries √
  • Course administration √
  • Reading and writing research
  • To think about
26
Reading and writing research papers
  • References:


  •  http://www.cse.ogi.edu/~dylan/
    efficientReading.html


  •  ftp://fast.cs.utah.edu/pub/writing-papers.ps


  • This section partially from Surendar Chandra
    of University of Notre Dame.


27
Why do you read a paper?
  • Understand and learn new contributions


  • However…
    • Not all papers are “good”
    • Not all papers are “interesting”
    • Not all papers are “worthwhile” for you


  • You have to learn to identify a good paper and spend your time wisely
    • Breadth
    • Depth
    • React
28
Reading a research paper
  • What is this paper about?
    • Read the title and the abstract
      • If you still don’t know what this paper is about, then this is a poorly-written paper.
    •  Read the conclusion
      • Are you now sure you know what this paper is about? If not, paper.


  • Read the _________
  • Read the ___________________
  • Read _________________________
29
How to read a paper
  • See who wrote it, where it was published, when was it written (credibility)
  • Skim references
    • Are authors are aware of relevant related work?
    • Do you know the work that they cite?
    • Do you know other work that they should have cited?
30
How to read a paper - depth
  • Approach with scientific skepticism
  • Examine the assumptions.  Are they:
    • Rely on any uncertain trends?
    • Reasonable?
      • e.g., “Let’s assume that there are billions of powerful computers, connected by a high speed network, spread across the world, our system will …”
      • e.g., “Our system functions in real-time on a 33Mhz Intel 386 with 640K main memory running Windows 98”
31
How to read a paper - depth
  • Examine the methods:
    • Did they measure what they claim?


    • Can they explain what they observed?
      • Want an analysis of why the system behaves a certain way, not raw data.


    • Did they have adequate controls?


    • Were tests carried out in a standard way? Were the performance metrics standard?
      • If not, do they explain their metrics clearly?
32
How to read a paper - depth
  • Examine the statistics:
    “Lies, d*mned lies and statistics”
    • Appropriate statistical tests applied properly?
    • Did they do proper error analysis?
    • Are the results statistically significant?
      • Common mistake: “We performed our experiment once at 4 am and noticed a ten fold improvement. Thus we conclude that our system is better”
    • Be very careful with percentages
      • Method A: 0.01 seconds, our Method: 0.005 seconds
      • Our method shows 100% improvement over method A!!
33
How to read a paper - depth
  • Examine the conclusions:
    • Do the conclusions follow logically from the conclusions
      • We performed our experiments with 8 palm pilots and saw a 10 fold improvement. Hence we conclude that our system will scale to millions of palm pilots


    • What other explanations are there for the observed effects
    • What other conclusions or correlations are there in the data that they did not point out
      • Earlier work performed experiments using a 2 Mbit wireless network. Our system (incidentally) used a 11 Mbit network and saw a 5 fold improvement. So our technique works!!
34
How to read a paper - react
  • Take notes
  • Highlight major points
  • React to the points in the paper
    • Place this work with your own experience
    • If you doubt a statement, note your objection

  • Summarize what you read
    • Good practice: maintain your own bibliography of all papers that you ever read
35
How to write a research paper
  • Write it such that anyone who reads it using the method we just discussed understands the idea


  • Clearly explain what problem you are solving, why it is interesting and how your solution solves this problem


  • Be crisp. Explain what your contributions are, what your ideas are and what are others’ ideas
36
Any questions?
  • Introduction to libraries √
  • Course administration √
  • Reading and writing research √
37
Survey on expectations
  • Go to IVLE and help me determine the needs for this course


    • Your background and knowledge
    • Expectations on what you want to learn
    • Optimal office hours

  • Please complete before next lecture!
38
To think about for discussion
  • What are the functions of a traditional library?
  • Are these same functions in the digital library?
  • How is the digital library different from:
    • _________?
    • _________?