the actual HTML content of a Web page is analyzed to induce information about the page. The body text and the title of a Web page, for example, can be analyzed to determine whether this page is relevant to a certain domain. Usually, words and phrases that appear in the title or headings in the HTML structure and key concept which is extracted using indexing techniques, together with domain knowledge, can determined the relevance.

Link-based approaches have drawn much attention in recent years. Web link structure has come to be used to infer important information about pages. This notion based on the assumption that if the author of a Web page places a link to another Web page, he or she believes that the other Web page is relevant to the one of him or her [Henzinger, 2001]. On the other hand, because the Web is structured as a hypermedia system and authors link pages to pages, users can also surf the Web from one document to another along hypermedia links, which contains relevant information meeting their needs [Huberman, 1998]