9
Yee Fan Tan, Min-Yen Kan and Dongwon Lee: Search Engine Driven Author Disambiguation
ACM/IEEE Joint Conference on Digital Libraries 2006
Weighting: Inverse Host Frequency (IHF)
•Observation
–Not all URLs are equally useful
–e.g., aggregator services
•Desired weighting scheme
–Low weights to aggregator web sites
–High weights to personal and group publication pages
•Inverse Host Frequency (IHF)
–Similar to Inverse Document Frequency (IDF) in information retrieval
•Consider citations of top 100 authors in DBLP (by number of citations)
•For each such citation, query search engine with its title to obtain URLs, truncate them to their hostnames
•If a hostname h has frequency f(h), then its IHF is