|
|
|
First, users create some set
of trained texts for a domain. They mark positive examples of relevant named
entities. The rest of the corpus is considered a pool of negative examples.
|
|
The algorithm goes through
training stage using this corpus.
|
|
Tagging rules are induced
only for left or right boundary of each Named Entity. For every positive
example algorithm does several steps:
|
|
1. build initial rule
|
|
2. generalize rule
|
|
3. keep k best
generalizations of the initial rule
|
|
|