|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectsg.edu.nus.comp.nlp.ims.corpus.ACorpus
sg.edu.nus.comp.nlp.ims.corpus.CAllWordsPlainCorpus
public class CAllWordsPlainCorpus
interface for a plain text. extract all the content words according to the POS tagging result.
Field Summary |
---|
Fields inherited from class sg.edu.nus.comp.nlp.ims.corpus.ACorpus |
---|
g_LIDX, g_PIDX, g_TIDX, m_Boundaries, m_DefaultDelimiter, m_Delimiter, m_DocIDs, m_IDs, m_Indice, m_InstanceLemmas, m_InstancePOSs, m_InstanceTokens, m_Lemmatized, m_Lemmatizer, m_Lengths, m_LexeltIDs, m_POSTagged, m_POSTagger, m_Ready, m_SatID2Index, m_SatIDs, m_SatIndice, m_SatSentenceIDs, m_SentenceIDs, m_Sentences, m_SentenceSplitter, m_Split, m_Tags, m_Tokenized, m_Tokenizer |
Constructor Summary | |
---|---|
CAllWordsPlainCorpus()
default constructor |
|
CAllWordsPlainCorpus(IPOSTagger p_POSTagger,
ISentenceSplitter p_Splitter,
ITokenizer p_Tokenizer,
ILemmatizer p_Lemmatizer)
constructor with some components |
Method Summary | |
---|---|
protected void |
genInfo()
collection some information |
boolean |
load(java.io.Reader p_Reader)
load data into corpus |
protected java.util.ArrayList<java.util.ArrayList<java.lang.String>> |
split(java.util.ArrayList<java.util.ArrayList<java.lang.String>> p_Texts)
split paragraph into sentences |
protected void |
tokenizeSentence(java.lang.String p_Sentence)
tokenize a sentence |
Methods inherited from class sg.edu.nus.comp.nlp.ims.corpus.ACorpus |
---|
clear, getIndexInSentence, getLength, getLowerBoundary, getSentence, getSentenceID, getTag, getUpperBoundary, getValue, isReady, isValidInstance, isValidSentence, lemmatize, numOfSentences, posTag, setDelimiter, setLemmatized, setPOSTagged, setSplit, setTokenized, size, tokenize, toString |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Constructor Detail |
---|
public CAllWordsPlainCorpus()
public CAllWordsPlainCorpus(IPOSTagger p_POSTagger, ISentenceSplitter p_Splitter, ITokenizer p_Tokenizer, ILemmatizer p_Lemmatizer)
p_POSTagger
- POS taggerp_Splitter
- Sentence splitterp_Tokenizer
- tokenzierp_Lemmatizer
- lemmatizerMethod Detail |
---|
public boolean load(java.io.Reader p_Reader) throws java.lang.Exception
ICorpus
load
in interface ICorpus
load
in class ACorpus
p_Reader
- reader of the input stream
java.lang.Exception
- exception while loading fileprotected java.util.ArrayList<java.util.ArrayList<java.lang.String>> split(java.util.ArrayList<java.util.ArrayList<java.lang.String>> p_Texts)
p_Texts
- paragraph
protected void tokenizeSentence(java.lang.String p_Sentence)
ACorpus
tokenizeSentence
in class ACorpus
p_Sentence
- input sentenceprotected void genInfo()
ACorpus
genInfo
in class ACorpus
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |