|
the actual HTML content of a
Web page is analyzed to induce information about the page. The body text and
the title of a Web page, for example, can be analyzed to determine whether
this page is relevant to a certain domain. Usually, words and phrases that
appear in the title or headings in the HTML structure and key concept which
is extracted using indexing techniques, together with domain knowledge, can
determined the relevance.
|
|
Link-based approaches have
drawn much attention in recent years. Web link structure has come to be used
to infer important information about pages. This notion based on the
assumption that if the author of a Web page places a link to another Web
page, he or she believes that the other Web page is relevant to the one of
him or her [Henzinger, 2001]. On the other hand, because the Web is
structured as a hypermedia system and authors link pages to pages, users can
also surf the Web from one document to another along hypermedia links, which
contains relevant information meeting their needs [Huberman, 1998]
|