Wei Lu and Min-Yen Kan
AIRS 2005 (Jeju Island, Korea)
3/22
Categorization Scheme
• Summarized 32 functionality-based categories
• Based on 1,637 instances from the WT10G corpus
• Carefully annotated by one of the authors for 40 hours
Dynamic Banner
Web Application
Calendar
We have summarized 32 functionality-based categories. This is done based on the instances selected from the WT10G corpus. The WT10G corpus is the standard Corpus used in TREC text retrieval conference evaluations. All the instances are carefully annotated manually.
This is the distribution graph showing the number of instances for each category. For example, there are 264 instances annotated as “Dynamic Banner” class, 62 instances annotated as “Web Application” class, and 5 instances annotated as “Calendar”.