sg.edu.nus.comp.nlp.ims.corpus
Class CAllWordsFineTaskCorpus

java.lang.Object
  extended by sg.edu.nus.comp.nlp.ims.corpus.ACorpus
      extended by sg.edu.nus.comp.nlp.ims.corpus.CLexicalCorpus
          extended by sg.edu.nus.comp.nlp.ims.corpus.CAllWordsFineTaskCorpus
All Implemented Interfaces:
ICorpus

public final class CAllWordsFineTaskCorpus
extends CLexicalCorpus

SensEval-2/3 and SemEval 2007 fine-grained all-words task test corpus.

Author:
zhongzhi

Field Summary
protected  java.util.regex.Pattern m_CleanPattern
           
 
Fields inherited from class sg.edu.nus.comp.nlp.ims.corpus.CLexicalCorpus
HEADEND, HEADENDPATTERN, HEADPATTERN, HEADSTART, HEADSTARTPATTERN, LEXELTMARK, LEXELTPATTERN, SATEND, SATENDPATTERN, SATPATTERN, SATSTART, SATSTARTPATTERN
 
Fields inherited from class sg.edu.nus.comp.nlp.ims.corpus.ACorpus
g_LIDX, g_PIDX, g_TIDX, m_Boundaries, m_DefaultDelimiter, m_Delimiter, m_DocIDs, m_IDs, m_Indice, m_InstanceLemmas, m_InstancePOSs, m_InstanceTokens, m_Lemmatized, m_Lemmatizer, m_Lengths, m_LexeltIDs, m_POSTagged, m_POSTagger, m_Ready, m_SatID2Index, m_SatIDs, m_SatIndice, m_SatSentenceIDs, m_SentenceIDs, m_Sentences, m_SentenceSplitter, m_Split, m_Tags, m_Tokenized, m_Tokenizer
 
Constructor Summary
CAllWordsFineTaskCorpus()
          default constructor
CAllWordsFineTaskCorpus(IPOSTagger p_POSTagger, ISentenceSplitter p_Splitter, ITokenizer p_Tokenizer, ILemmatizer p_Lemmatizer)
          constructor with some components
 
Method Summary
protected  void genInfo()
          collection some information
 boolean load(java.io.Reader p_Reader)
          load data into corpus
protected  java.lang.String loadText(org.jdom.Element p_Text)
          load texts
protected  java.util.ArrayList<java.util.ArrayList<java.lang.String>> split(java.util.ArrayList<java.lang.String> p_Texts)
          sentence split paragraphs
 
Methods inherited from class sg.edu.nus.comp.nlp.ims.corpus.CLexicalCorpus
tokenizeSentence
 
Methods inherited from class sg.edu.nus.comp.nlp.ims.corpus.ACorpus
clear, getIndexInSentence, getLength, getLowerBoundary, getSentence, getSentenceID, getTag, getUpperBoundary, getValue, isReady, isValidInstance, isValidSentence, lemmatize, numOfSentences, posTag, setDelimiter, setLemmatized, setPOSTagged, setSplit, setTokenized, size, tokenize, toString
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

m_CleanPattern

protected java.util.regex.Pattern m_CleanPattern
Constructor Detail

CAllWordsFineTaskCorpus

public CAllWordsFineTaskCorpus()
default constructor


CAllWordsFineTaskCorpus

public CAllWordsFineTaskCorpus(IPOSTagger p_POSTagger,
                               ISentenceSplitter p_Splitter,
                               ITokenizer p_Tokenizer,
                               ILemmatizer p_Lemmatizer)
constructor with some components

Parameters:
p_POSTagger - POS tagger
p_Splitter - Sentence splitter
p_Tokenizer - tokenzier
p_Lemmatizer - lemmatizer
Method Detail

load

public boolean load(java.io.Reader p_Reader)
             throws java.lang.Exception
Description copied from interface: ICorpus
load data into corpus

Specified by:
load in interface ICorpus
Overrides:
load in class CLexicalCorpus
Parameters:
p_Reader - reader of the input stream
Returns:
ready or not
Throws:
java.lang.Exception - exception while loading file

split

protected java.util.ArrayList<java.util.ArrayList<java.lang.String>> split(java.util.ArrayList<java.lang.String> p_Texts)
sentence split paragraphs

Parameters:
p_Texts - input
Returns:
sentences

loadText

protected java.lang.String loadText(org.jdom.Element p_Text)
                             throws java.lang.Exception
load texts

Parameters:
p_Text - text element
Returns:
plain text
Throws:
java.lang.Exception - exception while loading text element

genInfo

protected void genInfo()
Description copied from class: ACorpus
collection some information

Overrides:
genInfo in class CLexicalCorpus