 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
l |
Web Page Contents
|
|
|
|
l |
Use top-k
returned Web pages for each entity
|
|
|
|
l |
Represent each
set by a Virtual Document
|
|
|
|
l |
Some heuristics
|
|
|
|
§ |
D (m): Top m (≤ k) documents are concatenated
|
|
|
|
§ |
T (all, n): Top n tokens with the highest weight from all top-k web
pages
|
|
|
§ |
Snippet (m): Snippets of top m (≤ k) web pages
|
|
|
|
§ |
Probabilistic
Language Model: KL-divergence
|
|
|
|
l |
sim(ec,
ei) = doc_sim( vdoc(ec),
vdoc(ei) )
|
|
|
|