|
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
¡ |
Problem: If a
document consists is just a
|
|
|
subset of
another document, standard VS
|
|
model may show
low similarity
|
|
|
|
l |
Example:
cosine (D1,D2) = .61
|
|
|
|
|
D1:
<A, B, C>,
|
|
|
D2:
<A, B, C, D, E, F, G, H>
|
|
|
¡ |
Shivakumar and
Garcia-Molina (95): use
|
|
|
only close words
in VSM
|
|
|
|
l |
Close = comparable
frequency, defined by a
|
|
|
tunable ε distance.
|
|
|
|