Manual cataloging and
indexing
|
|
|
Module 5 Min-Yen KAN |
|
*heavily drawn from Lancaster (98) Indexing
and Abstracting in Theory and Practice |
Mesopotamian Catalogs
|
|
|
Mesopotamians kept track of their
tablets with a list of their incipits: |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
What is it? |
|
A poem? |
|
|
Some Definitions
|
|
|
|
(Subject) Indexing |
|
Assigning index terms to represent a
document |
|
Assists in document retrieval |
|
|
|
Classification |
|
Assigning a label to a document to
assist in organizing that information |
|
Not necessarily semantic labels |
Steps in Subject Indexing
|
|
|
|
Conceptual analysis |
|
Determine “aboutness” |
|
Computational approaches: TF × IDF |
|
|
|
Translation |
|
Expressing the concepts as index terms |
|
In controlled vocabularies, similar to
Taylor’s (68) compromised need |
Conceptual analysis
|
|
|
|
Generic: What is it about? What’s the
main content |
|
e.g., The History of Sociology |
|
Specific: Why has it been added to our
collection? What aspects will our users be interested in? |
|
c.f., “Every reader his book” |
|
|
|
Thus, organizations index differently |
|
Different subjects (specialty, general
interest) |
|
Different materials (own materials, 3rd
party) |
Index terms
Number of index terms in
record
|
|
|
|
Long (Exhaustive) |
|
Gives good recall at cost of precision |
|
Few records fit in the UI |
|
Hard to figure out which are main
aspects |
|
|
|
Short (Selective) |
|
Gives good precision at cost of recall |
|
Less work |
|
|
|
In practice: offer levels of indexing
for tasks |
|
Index Terms |
|
Abstract |
Translation
|
|
|
|
Extraction: use terms directly from the
source itself |
|
|
|
Assignment: use terms from an outside
source. |
|
Usually from a controlled vocabulary. |
|
|
Controlled vocabularies
|
|
|
|
|
Benefits |
|
(Potentially) high precision, high
recall |
|
Question: which of these components is
more important? |
|
|
|
Drawbacks |
|
Costly to construct and maintain |
|
Is difficult to use |
|
Need CV knowledge |
Controlled vocabulary
objectives
|
|
|
|
Control / suggest synonyms, pick an
authoritative term |
|
Especially for entities: people (maiden
names to married names), places (St. Petersburg) |
|
|
|
Distinguish among homographs (e.g.,
mercury, turkey) |
|
|
|
Link terms with their relationship
(is-a and all others (associative)) |
Controlled vocabulary
usability
|
|
|
|
|
Good structure to find the appropriate
term |
|
Standard fields in an CV: |
|
USE/UF: Use instead / Use For
(authoritative) |
|
BT/NT: Broader / Narrower Term in terms
of hierarchy |
|
RT: Related Term (Associative Term) |
|
|
|
Applied by experienced personnel |
|
A large vocabulary can be hard to map
to |
|
|
|
Question: What to do if the controlled
vocabulary has no term for the concept to be indexed? |
|
|
Controlled vocabulary
examples
|
|
|
|
|
General CVs |
|
Sears List of Subject Headings |
|
More general divisions, not intended
for research libraries |
|
Geared towards general subdivisions |
|
|
|
Library of Congress Subject Headinges
(LCSH) |
|
Comprehensive, very large, over five
volumes |
|
Domain-specific CV |
|
Medical Subject Headings (MeSH) |
|
Byproduct of indexing the NLM |
|
|
|
Art & Architecture Thesaurus (AAT) |
|
Object, images, architecture, styles |
|
|
|
ERIC Thesaurus |
|
Educational materials (journals, lesson
plans and computer files) |
|
|
Classification
Objectives of
classification
|
|
|
|
Uniqueness |
|
Be able to fetch a specific resource
given a call number |
|
|
|
Notational Permanence |
|
(Seldom) have to reorganize/reassign
labels |
|
(e.g., paradigm shift in mathematics) |
|
|
|
Comprehensive |
|
Can successfully classify most things |
|
|
|
Serendipity |
|
Collocate related subjects together |
|
|
|
Ease of Use |
|
Ways of resolving ambiguities |
|
(e.g., given religious architecture and
Egyptian architecture, where does an article on the architecture of Egyptian
temples go?) |
|
|
Types of classification
|
|
|
|
Enumerative |
|
Produce an alphabetical list of subject
headings, assign numbers to each heading in alphabetical order |
|
|
|
Hierarchical |
|
Recursively divides subjects
hierarchically, from most general to most specific |
|
|
|
Faceted (analytico-synthetic): |
|
Analytic: Divides subjects into
mutually exclusive orthogonal facets |
|
Synthetic: Combine facets to get a new
class |
|
|
|
- From Taylor (92) |
Dewey Decimal
Classification
|
|
|
|
Divide knowledge into ten classes |
|
Recursively divide these categories
into ten (or fewer classes) |
|
Assign another digit |
|
|
|
What type of classification scheme is
it? |
|
000 Generalities |
|
100 Philosophy & psychology |
|
200 Religion |
|
300 Social sciences |
|
400 Language |
|
500 Natural sciences & mathematics |
|
600 Technology (Applied sciences) |
|
700 The arts |
|
800 Literature & rhetoric |
|
900 Geography & history |
|
|
ACM Classification scheme
|
|
|
|
Four-level tree |
|
3 coded levels and |
|
a fourth uncoded level) |
|
|
|
16 General Terms |
|
H. Information Systems |
|
H.0 GENERAL |
|
H.1 MODELS AND PRINCIPLES |
|
H.2 DATABASE MANAGEMENT (E.5) |
|
H.3 INFORMATION STORAGE AND RETRIEVAL |
|
H.4 INFORMATION SYSTEMS APPLICATIONS |
|
H.5 INFORMATION INTERFACES
ANDPRESENTATION (e.g., HCI) (I.7) |
|
H.m MISCELLANEOUS |
|
I. Computing Methodologies |
|
I.0 GENERAL |
|
I.1 SYMBOLIC AND ALGEBRAIC
MANIPULATION |
|
I.2 ARTIFICIAL INTELLIGENCE |
|
I.3 COMPUTER GRAPHICS |
|
I.4 IMAGE PROCESSING AND COMPUTER
VISION |
|
I.5 PATTERN RECOGNITION |
|
I.6 SIMULATION AND MODELING (G.3) |
|
I.7 DOCUMENT AND TEXT PROCESSING (H.4,
H.5) |
|
I.m MISCELLANEOUS |
Faceted Indexing
|
|
|
|
|
Facet – a characteristic of the
resource (e.g., language) |
|
|
|
Each facet organized hierarchically |
|
allow drill-down browsing |
|
represented by |
|
set values (taxonomy) |
|
continuous values (spectrum) |
Colon Classification
|
|
|
|
Raganathan proposed 5 basic facets (PMEST): |
|
Personality – the subject matter |
|
Material |
|
Energy – process or action |
|
Space |
|
Time |
|
|
|
Each facet would have
its own classification schedule |
|
|
|
String together notation
to get classification number |
|
|
|
|
|
Example: |
|
The design of wooden furniture in 18th
century America |
|
|
|
|
To think about…
|
|
|
Now that we have free-text searching,
do you feel controlled vocabularies are still necessary or not? What do you
feel their impact will be in the future of the digital library? |
|
|
|
How would to improve the ACM
classification scheme? How to deal
with legacy schemes? |
|
|
|
Booksellers also need to use
classification to shelve books. Which
type of classification do you think booksellers use? Would you make any adaptations to the
classification schemes shown today? |