Web Information Seeking
CS6210 survey presentation
Guo Shuqiao
Yang Hui
15 Oct 2003

Outline
Introduction
Review on Mental Model Study
Current Research Trends
Hyperlink Pattern Modeling
Web Information Seeking and Other Research Areas
Conclusion and Perspectives

Introduction
Information seeking
Berry Picking
The process engaged in by humans to change their state of knowledge
Web Information Seeking
Web:
The biggest digital library available ( > 2 billion pages)
heterogeneous collection of information resources with minimal selection, organization, and retrieval standards
Differ from Traditional Digital Library
No real organization
No control of Input
No control of customer set

Outline
Introduction
Review on Mental Model Study
Current Research Trends
Hyperlink pattern modeling
Web Information Seeking and Other Research Areas
Conclusion and Perspectives

Study on Mental Models
[Slone 02] examined the influences of user’s mental model and the impact  on their searching behavior

Influencing Factors and Results
User’s understanding of the Web
User’s experience
Mental model
User’s expectation
User’s Goal
Situational goals
Specific search goals
Format goals

Outline
Introduction
Review on Mental Model Study
Current Research Trends
Hyperlink pattern modeling
Web Information Seeking and Other Research Areas
Conclusion and Perspectives

Current Research Trends
Content-based approaches [Michael03]
HTML body text
Title and headings
Anchor text, etc
Link-based approaches
Link structure infers information about pages
Surfing behavior of users can be abstract into patterns

Link Pattern Modeling - I
Markov chains model [Sarukkai 00]
Create probability distribution about which of the previous links is ‘good predictors’ of the next link.

Link Pattern Modeling - I
Markov chains model [Sarukkai 00]
Markov chains and eigen-vector decomposition techniques
 A : matrix representing transition probabilities
s(t): probability vector for all the states at time t

Link Pattern Modeling - II
Longest Repeating Subsequence Model [Pitkow 99]
Surfing paths can be represented as n-grams <X1, X2,…Xn> to indicate sequences of page clicks by a population of users visiting a web site
Find Longest repeating subsequence
E.g., 135 in 15135213544
Match the performance accuracy of the one-hop Markov model while reducing the complexity by nearly 33%

Link Pattern Modeling - III
Application of Link Pattern Modeling
Link prediction and Prefetching
Agent Assisted Navigation
Web Community
Website Organization and Optimization
Personalization
Limitations
Goodness of the models depends on the amount of training data available
Dimensionality (Markov chain matrix is typically very large)

Outline
Introduction
Review on Mental Model Study
Current Research Trends
Hyperlink pattern modeling
Web Information Seeking and Other Research Areas
Conclusion and Perspectives

         Related Research Areas
Information Retrieval
Search engines
PageRank
HITS
Information Extraction
Web Topic detection
Data Mining
Data warehousing
Web log analysis

Outline
Introduction
Review on Mental Model Study
Current Research Trends
Hyperlink pattern modeling
Web Information Seeking and Other Research Areas
Conclusion and Perspectives

Conclusion and Perspectives
Conclusion
Quick Review of Web Information Seeking
Mental Model Study
Current Research Trends
Relationship to Other Research Areas
Perspectives
Natural Language Processing
Multimedia Interaction
Digitalized Library Interview
Life-long Assisted Education
Persistence and Web Security

References
Slone (02) The Influence of Mental Models and Goals on Search Patterns During Web Interaction, JASIST 53(13):1152-1169 (2002). CL: Z671 JASIT
Sarukkai (00) Link prediction and path analysis using Markov chains, WWW 8.
James Pitkow and Peter Pirolli (99) Mining Longest Repeating Subsequences to Predict WWW Surfing, USITS' 99.

Slide 18