Class CTRECCorpus

java.lang.Object
  extended by sg.edu.nus.comp.nlp.ims.corpus.ACorpus
      extended by CTRECCorpus
All Implemented Interfaces:
ICorpus

public class CTRECCorpus
extends ACorpus


Field Summary
 
Fields inherited from class sg.edu.nus.comp.nlp.ims.corpus.ACorpus
g_LIDX, g_PIDX, g_TIDX, m_AlphabeticPattern, m_Boundaries, m_DefaultDelimiter, m_Delimiter, m_DocIDs, m_IDs, m_Indice, m_Instances, m_Judge, m_Lemmatized, m_Lemmatizer, m_LexeltIDs, m_POSs, m_POSTagged, m_POSTagger, m_SatID2Index, m_SatIDs, m_SatIndice, m_SatSentenceIDs, m_SentenceIDs, m_Sentences, m_SentenceSplitter, m_Split, m_Tags, m_Tokenized, m_Tokenizer
 
Constructor Summary
CTRECCorpus()
          default constructor
CTRECCorpus(IPOSTagger p_POSTagger, ISentenceSplitter p_Splitter, ITokenizer p_Tokenizer, ILemmatizer p_Lemmatizer)
          constructor with some components
 
Method Summary
protected  void genInfo()
          collection some information
 int getLowerBoundary(int p_Sentence)
          get lower boundary
 int getUpperBoundary(int p_Sentence)
          get upper boundary
 boolean load(java.io.Reader p_Reader)
          load data into corpus
static void main(java.lang.String[] p_Args)
           
protected  void posTag()
          pos tagging
protected  void tokenizeSentence(java.lang.String p_Sentence)
          tokenize a sentence
 
Methods inherited from class sg.edu.nus.comp.nlp.ims.corpus.ACorpus
alphabetic, clear, getIndexInSentence, getSentence, getSentenceID, getTag, getValue, lemmatize, numOfSentences, setDelimiter, setLemmatized, setPOSTagged, setSplit, setTokenized, size, tokenize, toString
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

CTRECCorpus

public CTRECCorpus()
default constructor


CTRECCorpus

public CTRECCorpus(IPOSTagger p_POSTagger,
                   ISentenceSplitter p_Splitter,
                   ITokenizer p_Tokenizer,
                   ILemmatizer p_Lemmatizer)
constructor with some components

Parameters:
p_POSTagger - POS tagger
p_Splitter - Sentence splitter
p_Tokenizer - tokenzier
p_Lemmatizer - lemmatizer
Method Detail

load

public boolean load(java.io.Reader p_Reader)
Description copied from interface: ICorpus
load data into corpus

Parameters:
p_Reader - reader of the input stream
Returns:
load and pre-process success or not

posTag

protected void posTag()
Description copied from class: ACorpus
pos tagging

Overrides:
posTag in class ACorpus

tokenizeSentence

protected void tokenizeSentence(java.lang.String p_Sentence)
Description copied from class: ACorpus
tokenize a sentence

Specified by:
tokenizeSentence in class ACorpus
Parameters:
p_Sentence - input sentence

genInfo

protected void genInfo()
Description copied from class: ACorpus
collection some information

Overrides:
genInfo in class ACorpus

getLowerBoundary

public int getLowerBoundary(int p_Sentence)
Description copied from interface: ICorpus
get lower boundary

Specified by:
getLowerBoundary in interface ICorpus
Overrides:
getLowerBoundary in class ACorpus
Parameters:
p_Sentence - sentence number
Returns:
lower boundary

getUpperBoundary

public int getUpperBoundary(int p_Sentence)
Description copied from interface: ICorpus
get upper boundary

Specified by:
getUpperBoundary in interface ICorpus
Overrides:
getUpperBoundary in class ACorpus
Parameters:
p_Sentence - sentence number
Returns:
upper boundary

main

public static void main(java.lang.String[] p_Args)
Parameters:
p_Args - arguments