|
1
|
- Module 5 Min-Yen
KAN
- *heavily drawn from Lancaster (98) Indexing and Abstracting in Theory
and Practice
|
|
2
|
- Mesopotamians kept track of their tablets with a list of their incipits:
- What is it?
- A poem?
|
|
3
|
- (Subject) Indexing
- Assigning index terms to represent a document
- Assists in document retrieval
- Classification
- Assigning a label to a document to assist in organizing that
information
- Not necessarily semantic labels
|
|
4
|
- Conceptual analysis
- Determine “aboutness”
- Computational approaches: TF × IDF
- Translation
- Expressing the concepts as index terms
- In controlled vocabularies, similar to Taylor’s (68) compromised need
|
|
5
|
- Generic: What is it about? What’s the main content
- e.g., The History of Sociology
- Specific: Why has it been added to our collection? What aspects will our
users be interested in?
- c.f., “Every reader his book”
- Thus, organizations index differently
- Different subjects (specialty, general interest)
- Different materials (own materials, 3rd party)
|
|
6
|
|
|
7
|
- Long (Exhaustive)
- Gives good recall at cost of precision
- Few records fit in the UI
- Hard to figure out which are main aspects
- Short (Selective)
- Gives good precision at cost of recall
- Less work
- In practice: offer levels of indexing for tasks
|
|
8
|
- Extraction: use terms directly from the source itself
- Assignment: use terms from an outside source.
- Usually from a controlled vocabulary.
|
|
9
|
- Benefits
- (Potentially) high precision, high recall
- Question: which of these components is more important?
- Drawbacks
- Costly to construct and maintain
- Is difficult to use
|
|
10
|
- Control / suggest synonyms, pick an authoritative term
- Especially for entities: people (maiden names to married names), places
(St. Petersburg)
- Distinguish among homographs (e.g., mercury, turkey)
- Link terms with their relationship (is-a and all others (associative))
|
|
11
|
- Good structure to find the appropriate term
- Standard fields in an CV:
- USE/UF: Use instead / Use For (authoritative)
- BT/NT: Broader / Narrower Term in terms of hierarchy
- RT: Related Term (Associative Term)
- Applied by experienced personnel
- A large vocabulary can be hard to map to
- Question: What to do if the controlled vocabulary has no term for the
concept to be indexed?
|
|
12
|
- General CVs
- Sears List of Subject Headings
- More general divisions, not intended for research libraries
- Geared towards general subdivisions
- Library of Congress Subject Headinges (LCSH)
- Comprehensive, very large, over five volumes
- Domain-specific CV
- Medical Subject Headings (MeSH)
- Byproduct of indexing the NLM
- Art & Architecture Thesaurus (AAT)
- Object, images, architecture, styles
- ERIC Thesaurus
- Educational materials (journals, lesson plans and computer files)
|
|
13
|
|
|
14
|
- Uniqueness
- Be able to fetch a specific resource given a call number
- Notational Permanence
- (Seldom) have to reorganize/reassign labels
- (e.g., paradigm shift in mathematics)
- Comprehensive
- Can successfully classify most things
- Serendipity
- Collocate related subjects together
- Ease of Use
- Ways of resolving ambiguities
- (e.g., given religious architecture and Egyptian architecture, where
does an article on the architecture of Egyptian temples go?)
|
|
15
|
- Enumerative
- Produce an alphabetical list of subject headings, assign numbers to
each heading in alphabetical order
- Hierarchical
- Recursively divides subjects hierarchically, from most general to most
specific
- Faceted (analytico-synthetic):
- Analytic: Divides subjects into mutually exclusive orthogonal facets
- Synthetic: Combine facets to get a new class
- - From Taylor (92)
|
|
16
|
- Divide knowledge into ten classes
- Recursively divide these categories into ten (or fewer classes)
- What type of classification scheme is it?
- 000 Generalities
- 100 Philosophy & psychology
- 200 Religion
- 300 Social sciences
- 400 Language
- 500 Natural sciences & mathematics
- 600 Technology (Applied sciences)
- 700 The arts
- 800 Literature & rhetoric
- 900 Geography & history
|
|
17
|
- Four-level tree
- 3 coded levels and
- a fourth uncoded level)
- 16 General Terms
- H. Information Systems
- H.0 GENERAL
- H.1 MODELS AND PRINCIPLES
- H.2 DATABASE MANAGEMENT (E.5)
- H.3 INFORMATION STORAGE AND RETRIEVAL
- H.4 INFORMATION SYSTEMS APPLICATIONS
- H.5 INFORMATION INTERFACES ANDPRESENTATION (e.g., HCI) (I.7)
- H.m MISCELLANEOUS
- I. Computing Methodologies
- I.0 GENERAL
- I.1 SYMBOLIC AND ALGEBRAIC MANIPULATION
- I.2 ARTIFICIAL INTELLIGENCE
- I.3 COMPUTER GRAPHICS
- I.4 IMAGE PROCESSING AND COMPUTER VISION
- I.5 PATTERN RECOGNITION
- I.6 SIMULATION AND MODELING (G.3)
- I.7 DOCUMENT AND TEXT PROCESSING (H.4, H.5)
- I.m MISCELLANEOUS
|
|
18
|
- Facet – a characteristic of the resource (e.g., language)
- Each facet organized hierarchically
- allow drill-down browsing
- represented by
- set values (taxonomy)
- continuous values (spectrum)
|
|
19
|
- Raganathan proposed 5 basic facets (PMEST):
- Personality – the subject matter
- Material
- Energy – process or action
- Space
- Time
- Each facet would have
its own classification
schedule
- String together notation
to get classification number
- Example:
- The design of wooden furniture in 18th century America
|
|
20
|
- Now that we have free-text searching, do you feel controlled
vocabularies are still necessary or not? What do you feel their impact
will be in the future of the digital library?
- How would to improve the ACM classification scheme? How to deal with legacy schemes?
- Booksellers also need to use classification to shelve books. Which type of classification do you
think booksellers use? Would you
make any adaptations to the classification schemes shown today?
|