sg.edu.nus.comp.nlp.ims.corpus
Class CLexicalCorpus

java.lang.Object
  extended by sg.edu.nus.comp.nlp.ims.corpus.ACorpus
      extended by sg.edu.nus.comp.nlp.ims.corpus.CLexicalCorpus
All Implemented Interfaces:
ICorpus
Direct Known Subclasses:
CAllWordsCoarseTaskCorpus, CAllWordsFineTaskCorpus

public class CLexicalCorpus
extends ACorpus

SensEval-2 lexical sample task test corpus. (Also used for training.)

Author:
zhongzhi

Field Summary
protected static java.lang.String HEADEND
           
protected static java.util.regex.Pattern HEADENDPATTERN
           
protected static java.util.regex.Pattern HEADPATTERN
           
protected static java.lang.String HEADSTART
           
protected static java.util.regex.Pattern HEADSTARTPATTERN
           
protected static java.lang.String LEXELTMARK
           
protected static java.util.regex.Pattern LEXELTPATTERN
           
protected static java.lang.String SATEND
           
protected static java.util.regex.Pattern SATENDPATTERN
           
protected static java.util.regex.Pattern SATPATTERN
           
protected static java.lang.String SATSTART
           
protected static java.util.regex.Pattern SATSTARTPATTERN
           
 
Fields inherited from class sg.edu.nus.comp.nlp.ims.corpus.ACorpus
g_LIDX, g_PIDX, g_TIDX, m_Boundaries, m_DefaultDelimiter, m_Delimiter, m_DocIDs, m_IDs, m_Indice, m_InstanceLemmas, m_InstancePOSs, m_InstanceTokens, m_Lemmatized, m_Lemmatizer, m_Lengths, m_LexeltIDs, m_POSTagged, m_POSTagger, m_Ready, m_SatID2Index, m_SatIDs, m_SatIndice, m_SatSentenceIDs, m_SentenceIDs, m_Sentences, m_SentenceSplitter, m_Split, m_Tags, m_Tokenized, m_Tokenizer
 
Constructor Summary
CLexicalCorpus()
          default constructor
CLexicalCorpus(IPOSTagger p_POSTagger, ISentenceSplitter p_Splitter, ITokenizer p_Tokenizer, ILemmatizer p_Lemmatizer)
          constructor with some components
 
Method Summary
protected  void genInfo()
          collection some information
 boolean load(java.io.Reader p_Reader)
          load data into corpus
protected  void tokenizeSentence(java.lang.String p_Sentence)
          tokenize a sentence
 
Methods inherited from class sg.edu.nus.comp.nlp.ims.corpus.ACorpus
clear, getIndexInSentence, getLength, getLowerBoundary, getSentence, getSentenceID, getTag, getUpperBoundary, getValue, isReady, isValidInstance, isValidSentence, lemmatize, numOfSentences, posTag, setDelimiter, setLemmatized, setPOSTagged, setSplit, setTokenized, size, tokenize, toString
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

HEADSTART

protected static java.lang.String HEADSTART

HEADEND

protected static java.lang.String HEADEND

SATSTART

protected static java.lang.String SATSTART

SATEND

protected static java.lang.String SATEND

HEADPATTERN

protected static java.util.regex.Pattern HEADPATTERN

HEADSTARTPATTERN

protected static java.util.regex.Pattern HEADSTARTPATTERN

HEADENDPATTERN

protected static java.util.regex.Pattern HEADENDPATTERN

SATPATTERN

protected static java.util.regex.Pattern SATPATTERN

SATSTARTPATTERN

protected static java.util.regex.Pattern SATSTARTPATTERN

SATENDPATTERN

protected static java.util.regex.Pattern SATENDPATTERN

LEXELTMARK

protected static final java.lang.String LEXELTMARK
See Also:
Constant Field Values

LEXELTPATTERN

protected static java.util.regex.Pattern LEXELTPATTERN
Constructor Detail

CLexicalCorpus

public CLexicalCorpus()
default constructor


CLexicalCorpus

public CLexicalCorpus(IPOSTagger p_POSTagger,
                      ISentenceSplitter p_Splitter,
                      ITokenizer p_Tokenizer,
                      ILemmatizer p_Lemmatizer)
constructor with some components

Parameters:
p_POSTagger - POS tagger
p_Splitter - Sentence splitter
p_Tokenizer - tokenzier
p_Lemmatizer - lemmatizer
Method Detail

load

public boolean load(java.io.Reader p_Reader)
             throws java.lang.Exception
Description copied from interface: ICorpus
load data into corpus

Specified by:
load in interface ICorpus
Specified by:
load in class ACorpus
Parameters:
p_Reader - reader of the input stream
Returns:
ready or not
Throws:
java.lang.Exception - exception while loading file

tokenizeSentence

protected void tokenizeSentence(java.lang.String p_Sentence)
Description copied from class: ACorpus
tokenize a sentence

Specified by:
tokenizeSentence in class ACorpus
Parameters:
p_Sentence - input sentence

genInfo

protected void genInfo()
Description copied from class: ACorpus
collection some information

Overrides:
genInfo in class ACorpus