¡Problem: If a document consists
is just a subset of another document, standard VS model
may show low similarity
lExample: cosine (D1,D2) =
.61
D1:
<A, B, C>,
D2:
<A, B, C, D, E, F, G, H>
¡
¡Shivakumar
and Garcia-Molina (95): use only close
words in VSM
lClose = comparable frequency, defined
by a tunable ε
distance.