17 Sep 2003
CS 6210 – Module 6
22
Adjusting Document Frequencies
ˇWe know ranking r of words according to document frequency in sample
ˇ
ˇWe know absolute document frequency f of some words from one-word queries
ˇ
ˇMandelbrot’s formula connects empirically word frequency f and ranking r
ˇ
ˇWe use curve-fitting to estimate the absolute frequency of all words in sample
r
f