10 Aug 2004
CS 5244: Orientation
43/32
Is fine-grained indexing worthwhile?
¡Problem: still have to scan document to find the term.
¡
¡
¡
¡
¡Cons:
lNeed access methods to take advantage
lExtra storage space overhead (variable sized)
¡Alternative methods:
lHierarchical encoding (doc #, para #, sent #, word #) to shrink offset size
lSplit long documents into n shorter ones.
Image (D1, 2), (D4, 1)
Implicit (D2, 1), (D3, 1) …
Index (D5, 3), (D2, 1) …
Inverse (D2, 2)
Internet (D1, 2), (D3, 2) …
Image (D1, 2; 10, 205), (D4, 1, 3993)
Implicit (D2, 1; 242), (D3, 1; 233) …
Index (D5, 3; 20, 42, 3920), (D2, 1  …
Inverse (D2, 2; 599, 847)
Internet (D1, 2; 12, 43), (D3, 2; 302, …