¡If a document
consists of just a subset of another
document, standard VS model may
show low similarity
lExample: Cosine (D1,D2) =
.61
D1: <A, B, C>,
D2: <A, B, C, D, E, F, G, H>
¡
¡Shivakumar
and Garcia-Molina (95): use only
close words in VSM
lClose = comparable frequency, defined by a tunable ε distance.