l Web Page Contents
l Use top-k returned Web pages for each entity
l Represent each set by a Virtual Document
l Some heuristics
§ D (m): Top m (≤ k) documents are concatenated
§ T (all, n): Top n tokens with the highest weight from all top-k web pages
§ Snippet (m): Snippets of top m (≤ k) web pages
§ Probabilistic Language Model: KL-divergence
l sim(ec, ei)  = doc_sim( vdoc(ec), vdoc(ei) )
WIDM 2007