Min-Yen Kan and Danny C. C. Poo
Known Item Queries (JCDL 2005)
15/25
Bootstrapping
•Constructing a language model with only 320 annotated instances is small
–Usual language models use millions of examples
•Try bootstrapping a model
–Use a sample’s annotation and apply to all in sample’s it represents
–More data, but also more noise
Bigram
Lang. Model
Self-labeled corpus
290K instances
Self-label
(noisy process)
Bigram
Lang. Model
Original
annotated
corpus