¡Problem: still have to scan document to find the term.
¡
¡
¡
¡
¡Cons:
lNeed access methods to take advantage
lExtra storage space overhead (variable sized)
¡Alternative methods:
lHierarchical encoding (doc #, para #, sent #, word #) to shrink offset size
lSplit long documents into n shorter ones.
Image (D1,
2), (D4, 1)
Implicit (D2,
1), (D3, 1) …
Index (D5,
3), (D2, 1) …
Inverse (D2,
2)
Internet (D1,
2), (D3, 2) …
Image (D1, 2;
10, 205), (D4, 1, 3993)
Implicit (D2,
1; 242), (D3, 1; 233) …
Index (D5, 3;
20, 42, 3920), (D2, 1 …
Inverse (D2,
2; 599, 847)
Internet (D1,
2; 12, 43), (D3, 2; 302, …