‹header›

‹date/time›

Click to edit Master text styles

Second level

Third level

Fourth level

Fifth level

‹footer›

‹#›

The World Wide Web has become the biggest digital library available. The number of unique indexable web pages has exceeded 2 billion and is still growing at a substantial rate [Lyman & Varian, 2000]. With such rapid growth of the Web size, it is almost impossible for a user to search useful information or navigate effectively through many of the Web documents. This increases the need for improved analysis and automatically searching on the Web. There has been much research on different ways of analyzing the content and structure of the Web.

What is Information Seeking?

Information seeking is the process engaged in by humans to change their state of knowledge. It is a high level cognitive process that is part of learning or problem solving. To seek information implies the need to change the state of one’s knowledge.

Information retrieval is concerned with getting information from databases.

Searching is the behavioral manifestation of information seeking.

thirty-one participants at Richard B. Harrison Library in Raleigh, North Carolina took part in the study. They were observed using the Internet and/or the Web catalog and interviewed before and after their sessions. Results identified four information seeking patterns, distinguished by the number of search approaches used. The approaches included, linking, use of search engines, URL use, online catalog use and searching within a web-site domain.

Current notions of online information seeking are, in part, are shaped by our understanding of end-user interactions with traditional online catalogs. Numerous studies reveal users' conceptual and technical problems with online catalogs and foretell potential challenges with new technology.

In [Slone02], they study the commingling of online catalogs with the Internet and call for an examination of the effects of this merger on end-users. Specifically, it examined the influences of goals and user understandings on user search patterns.

the actual HTML content of a Web page is analyzed to induce information about the page. The body text and the title of a Web page, for example, can be analyzed to determine whether this page is relevant to a certain domain. Usually, words and phrases that appear in the title or headings in the HTML structure and key concept which is extracted using indexing techniques, together with domain knowledge, can determined the relevance.

Link-based approaches have drawn much attention in recent years. Web link structure has come to be used to infer important information about pages. This notion based on the assumption that if the author of a Web page places a link to another Web page, he or she believes that the other Web page is relevant to the one of him or her [Henzinger, 2001]. On the other hand, because the Web is structured as a hypermedia system and authors link pages to pages, users can also surf the Web from one document to another along hypermedia links, which contains relevant information meeting their needs [Huberman, 1998]

Use Markov chains model and eigen-vector decomposition techniques to make Link prediction and path analysis and to generate hubs Sarukkai chose Markov chains, which had been enormously successful in sequence matching/generation, as mathematical model.

Sariukkai mentioned that the key of navigation lied in ‘personalization’, and he believed that this approach would lead to a satisfactory solution to navigating the huge World Wide Web.

The ultimate goal of information seeking is to provide people fast, accurate, direct, informative knowledge through seeking. Inspired by the bright future of this research area, some perspectives are worthy to mention. since the Web is so popular and knowledge in the Web is so abundant, there must be a way for Natural Language Process research to make full use of the Web and enhance system performance. However there is not much research done on this intersection. Currently, majority information on the Web is text data. Speech, or even nonverbal signals (posture, expression) should be the inputs for an information seeking system, including the Web. Other than mental model, which is really hard to measure, we could measure some physical indices (heart beat rate, impulse rate, temperature, brain wave, etc) about a particular user. It will be very helpful to determine a user’s mood and mental model. People begin to use the Internet search is because they are unsure about some thing. Therefore ambiguous user requests or inputs are very common. What if we could offer a system to conduct the interview process like in the traditional library that the librarian answers a few of questions to guide the user to clarify his own request? Such system will definitely understand the user’s requirement better and hence provide better services.

James Pitkow and Peter Pirolli (99) Mining Longest Repeating Subsequences to Predict WWW Surfing, USITS' 99.