10 Aug 2004
CS 5244: Orientation
57/32
Inverse Document Frequency
¡ Consider a future device for individual use, which is a sort of mechanized private file and library. It needs a name, and, to coin one at random, "memex" will do.
¡
¡Words with higher ft are less discriminative. 
¡Use inverse to measure importance:
lwt = 1/ft
lwt = ln (1+ N/ft) ß this one is most common
lwt = ln (1 + fm/ft), where fm is the max observed frequency
¡
¡
¡
Question: What’s the ln () here for?