sg.edu.nus.comp.nlp.ims.corpus
Class CAllWordsCoarseTaskCorpus

java.lang.Object
  extended by sg.edu.nus.comp.nlp.ims.corpus.ACorpus
      extended by sg.edu.nus.comp.nlp.ims.corpus.CLexicalCorpus
          extended by sg.edu.nus.comp.nlp.ims.corpus.CAllWordsCoarseTaskCorpus
All Implemented Interfaces:
ICorpus

public class CAllWordsCoarseTaskCorpus
extends CLexicalCorpus

SemEval 2007 coarse-grained all-words task test corpus.

Author:
zhongzhi

Field Summary
 
Fields inherited from class sg.edu.nus.comp.nlp.ims.corpus.CLexicalCorpus
HEADEND, HEADENDPATTERN, HEADPATTERN, HEADSTART, HEADSTARTPATTERN, LEXELTMARK, LEXELTPATTERN, SATEND, SATENDPATTERN, SATPATTERN, SATSTART, SATSTARTPATTERN
 
Fields inherited from class sg.edu.nus.comp.nlp.ims.corpus.ACorpus
g_LIDX, g_PIDX, g_TIDX, m_Boundaries, m_DefaultDelimiter, m_Delimiter, m_DocIDs, m_IDs, m_Indice, m_InstanceLemmas, m_InstancePOSs, m_InstanceTokens, m_Lemmatized, m_Lemmatizer, m_Lengths, m_LexeltIDs, m_POSTagged, m_POSTagger, m_Ready, m_SatID2Index, m_SatIDs, m_SatIndice, m_SatSentenceIDs, m_SentenceIDs, m_Sentences, m_SentenceSplitter, m_Split, m_Tags, m_Tokenized, m_Tokenizer
 
Constructor Summary
CAllWordsCoarseTaskCorpus()
          default constructor
CAllWordsCoarseTaskCorpus(IPOSTagger p_POSTagger, ISentenceSplitter p_Splitter, ITokenizer p_Tokenizer, ILemmatizer p_Lemmatizer)
          constructor with some components
 
Method Summary
protected  void genInfo()
          collection some information
 boolean load(java.io.Reader p_Reader)
          load data into corpus
protected  java.lang.String loadSentence(org.jdom.Element p_Sentence)
          load sentence
protected  java.util.ArrayList<java.lang.String> loadText(org.jdom.Element p_Text)
          load text
 
Methods inherited from class sg.edu.nus.comp.nlp.ims.corpus.CLexicalCorpus
tokenizeSentence
 
Methods inherited from class sg.edu.nus.comp.nlp.ims.corpus.ACorpus
clear, getIndexInSentence, getLength, getLowerBoundary, getSentence, getSentenceID, getTag, getUpperBoundary, getValue, isReady, isValidInstance, isValidSentence, lemmatize, numOfSentences, posTag, setDelimiter, setLemmatized, setPOSTagged, setSplit, setTokenized, size, tokenize, toString
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

CAllWordsCoarseTaskCorpus

public CAllWordsCoarseTaskCorpus()
default constructor


CAllWordsCoarseTaskCorpus

public CAllWordsCoarseTaskCorpus(IPOSTagger p_POSTagger,
                                 ISentenceSplitter p_Splitter,
                                 ITokenizer p_Tokenizer,
                                 ILemmatizer p_Lemmatizer)
constructor with some components

Parameters:
p_POSTagger - POS tagger
p_Splitter - Sentence splitter
p_Tokenizer - tokenzier
p_Lemmatizer - lemmatizer
Method Detail

load

public boolean load(java.io.Reader p_Reader)
             throws java.lang.Exception
Description copied from interface: ICorpus
load data into corpus

Specified by:
load in interface ICorpus
Overrides:
load in class CLexicalCorpus
Parameters:
p_Reader - reader of the input stream
Returns:
ready or not
Throws:
java.lang.Exception - exception while loading file

loadText

protected java.util.ArrayList<java.lang.String> loadText(org.jdom.Element p_Text)
                                                  throws java.lang.Exception
load text

Parameters:
p_Text - text element
Returns:
paragraphs
Throws:
java.lang.Exception - exception when loading text element

loadSentence

protected java.lang.String loadSentence(org.jdom.Element p_Sentence)
                                 throws java.lang.Exception
load sentence

Parameters:
p_Sentence - sentence elment
Returns:
sentence
Throws:
java.lang.Exception - exception when loading sentence element

genInfo

protected void genInfo()
Description copied from class: ACorpus
collection some information

Overrides:
genInfo in class CLexicalCorpus