After Generalization
Timeid
5
Digit
4
<stime>
At
3
Tag
SemCat
Case
LexCat
Lemma
Word
Action
Associated information
Condition
Word
index
First, users create some set of trained texts for a domain. They mark positive examples of relevant named entities. The rest of the corpus is considered a pool of negative examples.
The algorithm goes through training stage using this corpus.
Tagging rules are induced only for left or right boundary of each Named Entity. For every positive example algorithm does several steps:
1. build initial rule
2. generalize rule
3. keep k best generalizations of the initial rule