|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectsg.edu.nus.comp.nlp.ims.corpus.ACorpus
sg.edu.nus.comp.nlp.ims.corpus.CLexicalCorpus
public class CLexicalCorpus
SensEval-2 lexical sample task test corpus. (Also used for training.)
Field Summary | |
---|---|
protected static java.lang.String |
HEADEND
|
protected static java.util.regex.Pattern |
HEADENDPATTERN
|
protected static java.util.regex.Pattern |
HEADPATTERN
|
protected static java.lang.String |
HEADSTART
|
protected static java.util.regex.Pattern |
HEADSTARTPATTERN
|
protected static java.lang.String |
LEXELTMARK
|
protected static java.util.regex.Pattern |
LEXELTPATTERN
|
protected static java.lang.String |
SATEND
|
protected static java.util.regex.Pattern |
SATENDPATTERN
|
protected static java.util.regex.Pattern |
SATPATTERN
|
protected static java.lang.String |
SATSTART
|
protected static java.util.regex.Pattern |
SATSTARTPATTERN
|
Fields inherited from class sg.edu.nus.comp.nlp.ims.corpus.ACorpus |
---|
g_LIDX, g_PIDX, g_TIDX, m_Boundaries, m_DefaultDelimiter, m_Delimiter, m_DocIDs, m_IDs, m_Indice, m_InstanceLemmas, m_InstancePOSs, m_InstanceTokens, m_Lemmatized, m_Lemmatizer, m_Lengths, m_LexeltIDs, m_POSTagged, m_POSTagger, m_Ready, m_SatID2Index, m_SatIDs, m_SatIndice, m_SatSentenceIDs, m_SentenceIDs, m_Sentences, m_SentenceSplitter, m_Split, m_Tags, m_Tokenized, m_Tokenizer |
Constructor Summary | |
---|---|
CLexicalCorpus()
default constructor |
|
CLexicalCorpus(IPOSTagger p_POSTagger,
ISentenceSplitter p_Splitter,
ITokenizer p_Tokenizer,
ILemmatizer p_Lemmatizer)
constructor with some components |
Method Summary | |
---|---|
protected void |
genInfo()
collection some information |
boolean |
load(java.io.Reader p_Reader)
load data into corpus |
protected void |
tokenizeSentence(java.lang.String p_Sentence)
tokenize a sentence |
Methods inherited from class sg.edu.nus.comp.nlp.ims.corpus.ACorpus |
---|
clear, getIndexInSentence, getLength, getLowerBoundary, getSentence, getSentenceID, getTag, getUpperBoundary, getValue, isReady, isValidInstance, isValidSentence, lemmatize, numOfSentences, posTag, setDelimiter, setLemmatized, setPOSTagged, setSplit, setTokenized, size, tokenize, toString |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
protected static java.lang.String HEADSTART
protected static java.lang.String HEADEND
protected static java.lang.String SATSTART
protected static java.lang.String SATEND
protected static java.util.regex.Pattern HEADPATTERN
protected static java.util.regex.Pattern HEADSTARTPATTERN
protected static java.util.regex.Pattern HEADENDPATTERN
protected static java.util.regex.Pattern SATPATTERN
protected static java.util.regex.Pattern SATSTARTPATTERN
protected static java.util.regex.Pattern SATENDPATTERN
protected static final java.lang.String LEXELTMARK
protected static java.util.regex.Pattern LEXELTPATTERN
Constructor Detail |
---|
public CLexicalCorpus()
public CLexicalCorpus(IPOSTagger p_POSTagger, ISentenceSplitter p_Splitter, ITokenizer p_Tokenizer, ILemmatizer p_Lemmatizer)
p_POSTagger
- POS taggerp_Splitter
- Sentence splitterp_Tokenizer
- tokenzierp_Lemmatizer
- lemmatizerMethod Detail |
---|
public boolean load(java.io.Reader p_Reader) throws java.lang.Exception
ICorpus
load
in interface ICorpus
load
in class ACorpus
p_Reader
- reader of the input stream
java.lang.Exception
- exception while loading fileprotected void tokenizeSentence(java.lang.String p_Sentence)
ACorpus
tokenizeSentence
in class ACorpus
p_Sentence
- input sentenceprotected void genInfo()
ACorpus
genInfo
in class ACorpus
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |