Notes
Slide Show
Outline
1
Manual cataloging and indexing
  • Module 5                Min-Yen KAN
  • *heavily drawn from Lancaster (98) Indexing and Abstracting in Theory and Practice
2
Mesopotamian Catalogs
  • Mesopotamians kept track of their tablets with a list of their incipits:









  • What is it?
  • A poem?


3
Some Definitions
  • (Subject) Indexing
    • Assigning index terms to represent a document
    • Assists in document retrieval


  • Classification
    • Assigning a label to a document to assist in organizing that information
    • Not necessarily semantic labels
4
Steps in Subject Indexing
  • Conceptual analysis
    • Determine “aboutness”
    • Computational approaches: TF × IDF


  • Translation
    • Expressing the concepts as index terms
    • In controlled vocabularies, similar to Taylor’s (68) compromised need
5
Conceptual analysis
  • Generic: What is it about? What’s the main content
    • e.g., The History of Sociology
  • Specific: Why has it been added to our collection? What aspects will our users be interested in?
    • c.f., “Every reader his book”

  • Thus, organizations index differently
    • Different subjects (specialty, general interest)
    • Different materials (own materials, 3rd party)
6
Index terms
  • 1. Libraries
  • 2.
  • 3.
  • 4.
  • 5.
7
Number of index terms in record
  • Long (Exhaustive)
    • Gives good recall at cost of precision
    • Few records fit in the UI
    • Hard to figure out which are main aspects


  • Short (Selective)
    • Gives good precision at cost of recall
    • Less work


  • In practice: offer levels of indexing for tasks
    • Index Terms
    • Abstract
8
Translation
  • Extraction: use terms directly from the source itself


  • Assignment: use terms from an outside source.
    • Usually from a controlled vocabulary.


9
Controlled vocabularies
  • Benefits
    • (Potentially) high precision, high recall
    • Question: which of these components is more important?


  • Drawbacks
    • Costly to construct and maintain
    • Is difficult to use
      • Need CV knowledge
10
Controlled vocabulary objectives
  • Control / suggest synonyms, pick an authoritative term
    • Especially for entities: people (maiden names to married names), places (St. Petersburg)


  • Distinguish among homographs (e.g., mercury, turkey)


  • Link terms with their relationship (is-a and all others (associative))
11
Controlled vocabulary usability
  • Good structure to find the appropriate term
    • Standard fields in an CV:
      • USE/UF: Use instead / Use For (authoritative)
      • BT/NT: Broader / Narrower Term in terms of hierarchy
      • RT: Related Term (Associative Term)


  • Applied by experienced personnel
    • A large vocabulary can be hard to map to


    • Question: What to do if the controlled vocabulary has no term for the concept to be indexed?


12
Controlled vocabulary examples
  • General CVs
    • Sears List of Subject Headings
      • More general divisions, not intended for research libraries
      • Geared towards general subdivisions


    • Library of Congress Subject Headinges (LCSH)
      • Comprehensive, very large, over five volumes
  • Domain-specific CV
    • Medical Subject Headings (MeSH)
      • Byproduct of indexing the NLM


    • Art & Architecture Thesaurus (AAT)
      • Object, images, architecture, styles

    • ERIC Thesaurus
      • Educational materials (journals, lesson plans and computer files)

13
Classification
14
Objectives of classification
  • Uniqueness
    • Be able to fetch a specific resource given a call number


  • Notational Permanence
    • (Seldom) have to reorganize/reassign labels
    • (e.g., paradigm shift in mathematics)


  • Comprehensive
    • Can successfully classify most things


  • Serendipity
    • Collocate related subjects together


  • Ease of Use
    • Ways of resolving ambiguities
    • (e.g., given religious architecture and Egyptian architecture, where does an article on the architecture of Egyptian temples go?)

15
Types of classification
  • Enumerative
    • Produce an alphabetical list of subject headings, assign numbers to each heading in alphabetical order


  • Hierarchical
    • Recursively divides subjects hierarchically, from most general to most specific


  • Faceted (analytico-synthetic):
    • Analytic: Divides subjects into mutually exclusive orthogonal facets
    • Synthetic: Combine facets to get a new class


  • - From Taylor (92)
16
Dewey Decimal Classification
  • Divide knowledge into ten classes
  • Recursively divide these categories into ten (or fewer classes)
    • Assign another digit

  • What type of classification scheme is it?
  • 000 Generalities
  • 100 Philosophy & psychology
  • 200 Religion
  • 300 Social sciences
  • 400 Language
  • 500 Natural sciences & mathematics
  • 600 Technology (Applied sciences)
  • 700 The arts
  • 800 Literature & rhetoric
  • 900 Geography & history


17
ACM Classification scheme
  • Four-level tree
    • 3 coded levels and
    • a fourth uncoded level)


  • 16 General Terms
  • H. Information Systems
  • H.0 GENERAL
  • H.1 MODELS AND PRINCIPLES
  • H.2 DATABASE MANAGEMENT (E.5)
  • H.3 INFORMATION STORAGE AND RETRIEVAL
  • H.4 INFORMATION SYSTEMS APPLICATIONS
  • H.5 INFORMATION INTERFACES ANDPRESENTATION (e.g., HCI) (I.7)
  • H.m MISCELLANEOUS
  • I. Computing Methodologies
  • I.0 GENERAL
  • I.1 SYMBOLIC AND ALGEBRAIC MANIPULATION
  • I.2 ARTIFICIAL INTELLIGENCE
  • I.3 COMPUTER GRAPHICS
  • I.4 IMAGE PROCESSING AND COMPUTER VISION
  • I.5 PATTERN RECOGNITION
  • I.6 SIMULATION AND MODELING (G.3)
  • I.7 DOCUMENT AND TEXT PROCESSING (H.4, H.5)
  • I.m MISCELLANEOUS
18
Faceted Indexing
  • Facet – a characteristic of the resource (e.g., language)


  • Each facet organized hierarchically
    • allow drill-down browsing
    • represented by
      • set values (taxonomy)
      • continuous values (spectrum)
19
Colon Classification
  • Raganathan proposed 5 basic facets (PMEST):
    • Personality – the subject matter
    • Material
    • Energy – process or action
    • Space
    • Time


  • Each facet would have
     its own classification schedule


  • String together notation
    to get classification number



  • Example:
  • The design of wooden furniture in 18th century America



20
To think about…
  • Now that we have free-text searching, do you feel controlled vocabularies are still necessary or not? What do you feel their impact will be in the future of the digital library?


  • How would to improve the ACM classification scheme?  How to deal with legacy schemes?


  • Booksellers also need to use classification to shelve books.  Which type of classification do you think booksellers use?  Would you make any adaptations to the classification schemes shown today?