13 Nov 2004
WIDM 04: Lee et al. Co-training Web Block Classification
12
Stylistic Features
•Layout: guess from first level DOM nodes
–Linear
–<Table>: Use reading order, cell type propagation
–XHTML / CSS (e.g., <DIV>): Translate relative to absolute positioning, model depth
»
•Font (CSS too): relative features
»
•Image size
–
•