two_trans
Scalability
l Not scalable:
l A large number of Web accesses
l Network traffic, load of search engine and web sites
l Solutions:
l A better blocking scheme
l Local snapshot of the Web
§ Stanford WebBase Project
§ ~100 million web pages from >50,000 sites including many .edu domains
§ Downloaded the half of the data & filtered
§ Local snapshot containing 3.5 million relevant pages
WIDM 2007