|
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
| ¡ |
If a document
consists of just a subset of
|
|
another
document, standard VS model
|
|
|
may show low
similarity
|
|
|
|
l |
Example:
Cosine (D1,D2) = .61
|
|
|
|
|
D1:
<A, B, C>,
|
|
|
D2:
<A, B, C, D, E, F, G, H>
|
|
|
| ¡ |
Shivakumar and
Garcia-Molina (95): use
|
|
|
only close words
in VSM
|
|
|
|
l |
Close =
_________________, defined by a
|
|
|
tunable ε distance.
|
|
|
|