13 Nov 2004
WIDM 04: Lee et al. Co-training Web Block Classification
14
Evaluations
•Adapted co-training:
–Sample balancing: preserve ratio of noisily labeled examples, poor performance without it
–Replace unlabeled data at each round
•Use BoosTexter: handles word features easily
•Five fold cross validation
•
•General performance?
»
•Specific performance on:
–Fine-grained classification?
–XHTML / DIV pages?
–Others’ tasks?