 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
|
|
 |
|
|
|
|
|
|
|
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
• |
Refined
|
|
|
|
– |
Lexical: keyword lists
|
|
|
for
specific fields.
|
|
|
|
– |
Position: first 15 lines
|
|
|
of
the article (simulates
|
|
|
the
header of article)
|
|
|
|
– |
Email: handwritten
|
|
|
regular
expressions
|
|
|
|
– |
Numeric: both digit and
|
|
word
based
|
|
|
|
– |
Orthographic:
|
|
|
capitalization
feature
|
|
|
|
|
|
|
 |
 |
 |
 |
 |
 |
 |
• |
Baseline
|
|
|
|
– |
Vocabulary: lexical
|
|
|
tokens
+/- window of 2
|
|
|
|
|
|
|
 |
Purposefully
simple to assess the
|
efficacy of CRFs and
features
|
|
|
|
|
|
|
|
|
 |
|
 |
|
|