10 Aug 2004
CS 5244: Orientation
58/32
This is TF*IDF
¡Many variants, but all capture:
lTerm frequency:
Rd,t as being __________________
l
lInverse Document Frequency:
Wt as being ___________________
¡
¡Standard formulation is:
wd,t   = rd,t  × wt 
= (1+ ln(fd,t)) × ln (1 + N/ft)
¡
¡Problem:
lrd,t grows as document grows, need to normalize; otherwise biased towards _____________