5244: Orientation - Practical IR

10 Aug 2004

CS 5244: Orientation

45/32

Building the index – Memory based inversion

¡Takes lots of main memory, ugh!

¡Can we reduce the memory requirement?

Initialize empty dictionary S

// Phase I – collection of term appearances in memory

For each document Dd in collection, 1 ≤ d ≤ N

Read Dd, parsing it into index terms

For each index term t in Dd

Calculate fd,t

Search in S for t, if not present, insert it

Append node (d,fd,t) to list for term t

// Phase II – dump inverted file
For each term 1 ≤ t ≤ n
Start a new inverted file entry
Append each appropriate (d,fd,t) in list to entry
Append to inverted file