Features
Refined
 Lexical: keyword lists
for specific fields.
 Position: first 15 lines
of the article (simulates
the header of article)
 Email: handwritten
regular expressions
 Numeric: both digit and
word based
 Orthographic:
capitalization feature
Baseline
 Vocabulary: lexical
tokens +/- window of 2
Purposefully simple to assess the
efficacy of CRFs and features