 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
l |
Not scalable:
|
|
|
|
l |
A large number
of Web accesses
|
|
|
|
l |
Network traffic,
load of search engine and web sites
|
|
|
l |
Solutions:
|
|
|
|
l |
A better blocking
scheme
|
|
|
|
l |
Local snapshot of
the Web
|
|
|
|
§ |
Stanford WebBase
Project
|
|
|
|
§ |
~100 million web
pages from >50,000 sites including many .edu domains
|
|
|
§ |
Downloaded the
half of the data & filtered
|
|
|
|
§ |
Local snapshot
containing 3.5 million relevant pages
|
|