ˇWe know ranking r of words according to document frequency in sample
ˇ
ˇWe know absolute document frequency f of some
words from one-word queries
ˇ
ˇMandelbrot’s formula connects
empirically word
frequency f and ranking r
ˇ
ˇWe use curve-fitting to estimate the absolute frequency of all words in sample