sg.edu.nus.comp.nlp.ims.util
Class CPennTreeBankTokenizer

java.lang.Object
  extended by sg.edu.nus.comp.nlp.ims.util.CPennTreeBankTokenizer
All Implemented Interfaces:
ITokenizer

public class CPennTreeBankTokenizer
extends java.lang.Object
implements ITokenizer

Penn Treebank tokenizer. before you use this class, please read the introduction below first. CPennTreeBankTokenizer could only deal with a string which contains only one line in it!!! If you input a string with many lines, there would be some errors unexpected.

Author:
zhongzhi

Field Summary
protected static java.util.regex.Pattern[] PREPATTERN
           
protected static java.util.ArrayList<java.lang.String> PREREGEX
           
protected static java.util.ArrayList<java.lang.String> PREREPLACE
           
protected static java.util.regex.Pattern SEGMENTER
           
 
Constructor Summary
CPennTreeBankTokenizer()
           
 
Method Summary
static void main(java.lang.String[] args)
           
 java.lang.String[] tokenize(java.lang.String p_Sentence)
          tokenize an input sentence into tokens
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

PREREGEX

protected static java.util.ArrayList<java.lang.String> PREREGEX

PREREPLACE

protected static java.util.ArrayList<java.lang.String> PREREPLACE

PREPATTERN

protected static java.util.regex.Pattern[] PREPATTERN

SEGMENTER

protected static java.util.regex.Pattern SEGMENTER
Constructor Detail

CPennTreeBankTokenizer

public CPennTreeBankTokenizer()
Method Detail

tokenize

public java.lang.String[] tokenize(java.lang.String p_Sentence)
Description copied from interface: ITokenizer
tokenize an input sentence into tokens

Specified by:
tokenize in interface ITokenizer
Parameters:
p_Sentence - input sentence
Returns:
tokens

main

public static void main(java.lang.String[] args)