sg.edu.nus.comp.nlp.ims.util
Class CPennTreeBankTokenizer
java.lang.Object
sg.edu.nus.comp.nlp.ims.util.CPennTreeBankTokenizer
- All Implemented Interfaces:
- ITokenizer
public class CPennTreeBankTokenizer
- extends java.lang.Object
- implements ITokenizer
Penn Treebank tokenizer. before you use this class, please read the
introduction below first. CPennTreeBankTokenizer could only deal with a
string which contains only one line in it!!! If you input a string with many
lines, there would be some errors unexpected.
- Author:
- zhongzhi
Field Summary |
protected static java.util.regex.Pattern[] |
PREPATTERN
|
protected static java.util.ArrayList<java.lang.String> |
PREREGEX
|
protected static java.util.ArrayList<java.lang.String> |
PREREPLACE
|
protected static java.util.regex.Pattern |
SEGMENTER
|
Method Summary |
static void |
main(java.lang.String[] args)
|
java.lang.String[] |
tokenize(java.lang.String p_Sentence)
tokenize an input sentence into tokens |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
PREREGEX
protected static java.util.ArrayList<java.lang.String> PREREGEX
PREREPLACE
protected static java.util.ArrayList<java.lang.String> PREREPLACE
PREPATTERN
protected static java.util.regex.Pattern[] PREPATTERN
SEGMENTER
protected static java.util.regex.Pattern SEGMENTER
CPennTreeBankTokenizer
public CPennTreeBankTokenizer()
tokenize
public java.lang.String[] tokenize(java.lang.String p_Sentence)
- Description copied from interface:
ITokenizer
- tokenize an input sentence into tokens
- Specified by:
tokenize
in interface ITokenizer
- Parameters:
p_Sentence
- input sentence
- Returns:
- tokens
main
public static void main(java.lang.String[] args)