10 Aug 2004
CS 5244: Orientation
45/32
Building the index – Memory based inversion
¡Takes lots of main memory, ugh!
¡Can we reduce the memory requirement?
Initialize empty dictionary S
// Phase I – collection of term appearances in memory
For each document Dd in collection, 1 ≤ d ≤ N
Read Dd, parsing it into index terms
For each index term t in Dd
Calculate fd,t
Search in S for t, if not present, insert it
Append node (d,fd,t) to list for term t
// Phase II – dump inverted file
For each term 1 ≤ t ≤ n
Start a new inverted file entry
Append each appropriate (d,fd,t) in list to entry
Append to inverted file