24 Aug 2004
CS 5244: Indexing / Classification
52
Automatic Extraction of Metadata
¡Rule-based scripts (fragile):
lDC Dot Demo: http://www.ukoln.ac.uk/cgi-bin/dcdot.pl
lStill heavily cited and used!
l
¡Wrapper induction: localized extraction
lDefine a local context and features to match and extract
l
¡Text classification: classification
lUse features over the entire document to determine classification. 
l