¡Many variants, but all capture:
lTerm frequency:
Rd,t as being
__________________
l
lInverse Document Frequency:
Wt as being ___________________
¡
¡Standard formulation is:
wd,t = rd,t × wt
= (1+ ln(fd,t)) ×
ln (1 + N/ft)
¡
¡Problem:
lrd,t grows as
document grows, need to normalize; otherwise biased towards _____________