|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectsg.edu.nus.comp.nlp.ims.corpus.ACorpus
sg.edu.nus.comp.nlp.ims.corpus.CLexicalCorpus
sg.edu.nus.comp.nlp.ims.corpus.CAllWordsFineTaskCorpus
public final class CAllWordsFineTaskCorpus
SensEval-2/3 and SemEval 2007 fine-grained all-words task test corpus.
Field Summary | |
---|---|
protected java.util.regex.Pattern |
m_CleanPattern
|
Fields inherited from class sg.edu.nus.comp.nlp.ims.corpus.CLexicalCorpus |
---|
HEADEND, HEADENDPATTERN, HEADPATTERN, HEADSTART, HEADSTARTPATTERN, LEXELTMARK, LEXELTPATTERN, SATEND, SATENDPATTERN, SATPATTERN, SATSTART, SATSTARTPATTERN |
Fields inherited from class sg.edu.nus.comp.nlp.ims.corpus.ACorpus |
---|
g_LIDX, g_PIDX, g_TIDX, m_Boundaries, m_DefaultDelimiter, m_Delimiter, m_DocIDs, m_IDs, m_Indice, m_InstanceLemmas, m_InstancePOSs, m_InstanceTokens, m_Lemmatized, m_Lemmatizer, m_Lengths, m_LexeltIDs, m_POSTagged, m_POSTagger, m_Ready, m_SatID2Index, m_SatIDs, m_SatIndice, m_SatSentenceIDs, m_SentenceIDs, m_Sentences, m_SentenceSplitter, m_Split, m_Tags, m_Tokenized, m_Tokenizer |
Constructor Summary | |
---|---|
CAllWordsFineTaskCorpus()
default constructor |
|
CAllWordsFineTaskCorpus(IPOSTagger p_POSTagger,
ISentenceSplitter p_Splitter,
ITokenizer p_Tokenizer,
ILemmatizer p_Lemmatizer)
constructor with some components |
Method Summary | |
---|---|
protected void |
genInfo()
collection some information |
boolean |
load(java.io.Reader p_Reader)
load data into corpus |
protected java.lang.String |
loadText(org.jdom.Element p_Text)
load texts |
protected java.util.ArrayList<java.util.ArrayList<java.lang.String>> |
split(java.util.ArrayList<java.lang.String> p_Texts)
sentence split paragraphs |
Methods inherited from class sg.edu.nus.comp.nlp.ims.corpus.CLexicalCorpus |
---|
tokenizeSentence |
Methods inherited from class sg.edu.nus.comp.nlp.ims.corpus.ACorpus |
---|
clear, getIndexInSentence, getLength, getLowerBoundary, getSentence, getSentenceID, getTag, getUpperBoundary, getValue, isReady, isValidInstance, isValidSentence, lemmatize, numOfSentences, posTag, setDelimiter, setLemmatized, setPOSTagged, setSplit, setTokenized, size, tokenize, toString |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
protected java.util.regex.Pattern m_CleanPattern
Constructor Detail |
---|
public CAllWordsFineTaskCorpus()
public CAllWordsFineTaskCorpus(IPOSTagger p_POSTagger, ISentenceSplitter p_Splitter, ITokenizer p_Tokenizer, ILemmatizer p_Lemmatizer)
p_POSTagger
- POS taggerp_Splitter
- Sentence splitterp_Tokenizer
- tokenzierp_Lemmatizer
- lemmatizerMethod Detail |
---|
public boolean load(java.io.Reader p_Reader) throws java.lang.Exception
ICorpus
load
in interface ICorpus
load
in class CLexicalCorpus
p_Reader
- reader of the input stream
java.lang.Exception
- exception while loading fileprotected java.util.ArrayList<java.util.ArrayList<java.lang.String>> split(java.util.ArrayList<java.lang.String> p_Texts)
p_Texts
- input
protected java.lang.String loadText(org.jdom.Element p_Text) throws java.lang.Exception
p_Text
- text element
java.lang.Exception
- exception while loading text elementprotected void genInfo()
ACorpus
genInfo
in class CLexicalCorpus
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |