sg.edu.nus.comp.nlp.ims.feature
Class CSurroundingWordExtractor

java.lang.Object
  extended by sg.edu.nus.comp.nlp.ims.feature.CSurroundingWordExtractor
All Implemented Interfaces:
IFeatureExtractor

public class CSurroundingWordExtractor
extends java.lang.Object
implements IFeatureExtractor

surrounding word extractor.

Author:
zhongzhi

Field Summary
protected static int g_LIDX
           
protected static int g_TIDX
           
protected  ICorpus m_Corpus
           
protected  IFeature m_CurrentFeature
           
protected  CSurroundingWordFilter m_Filter
           
protected  int m_Index
           
protected  int m_IndexInSentence
           
protected  int m_InstanceLength
           
protected  int m_Left
           
protected  int m_Right
           
protected  ISentence m_Sentence
           
protected  int m_SurroundingWordIndex
           
protected  java.util.ArrayList<java.lang.String> m_SurroundingWords
           
protected  java.util.HashSet<java.lang.String> m_SurroundingWordSet
           
 
Constructor Summary
CSurroundingWordExtractor()
          constructor
CSurroundingWordExtractor(java.util.HashSet<java.lang.String> p_StopWords)
          constructor
CSurroundingWordExtractor(int p_Left, int p_Right)
          constructor
CSurroundingWordExtractor(int p_Left, int p_Right, java.util.HashSet<java.lang.String> p_StopWords)
          constructor
 
Method Summary
 boolean filter(java.lang.String p_Word)
          check whether word is in stop word list or contains no alphabet
 java.lang.String getCurrentInstanceID()
          get the ID of current instance to be extracted
 boolean hasNext()
          whether has at least one more feature
 IFeature next()
          get the next feature
 boolean restart()
          restart the iterator
 boolean setCorpus(ICorpus p_Corpus)
          set corpus to be extracted
 boolean setCurrentInstance(int p_Index)
          set the index of instance which to be extracted from corpus
protected  boolean validIndex(int p_Index)
          check the validity of index
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

m_Corpus

protected ICorpus m_Corpus

m_Index

protected int m_Index

m_Sentence

protected ISentence m_Sentence

m_IndexInSentence

protected int m_IndexInSentence

m_InstanceLength

protected int m_InstanceLength

m_SurroundingWordIndex

protected int m_SurroundingWordIndex

m_SurroundingWordSet

protected java.util.HashSet<java.lang.String> m_SurroundingWordSet

m_SurroundingWords

protected java.util.ArrayList<java.lang.String> m_SurroundingWords

m_Left

protected int m_Left

m_Right

protected int m_Right

m_Filter

protected CSurroundingWordFilter m_Filter

m_CurrentFeature

protected IFeature m_CurrentFeature

g_LIDX

protected static int g_LIDX

g_TIDX

protected static int g_TIDX
Constructor Detail

CSurroundingWordExtractor

public CSurroundingWordExtractor()
constructor


CSurroundingWordExtractor

public CSurroundingWordExtractor(int p_Left,
                                 int p_Right)
constructor

Parameters:
p_Left - number of sentences left to current sentence that will be used to extract surrounding words
p_Right - number of sentences right to current sentence that will be used to extract surrounding words

CSurroundingWordExtractor

public CSurroundingWordExtractor(java.util.HashSet<java.lang.String> p_StopWords)
constructor

Parameters:
p_StopWords - stop word list

CSurroundingWordExtractor

public CSurroundingWordExtractor(int p_Left,
                                 int p_Right,
                                 java.util.HashSet<java.lang.String> p_StopWords)
constructor

Parameters:
p_Left - number of sentences left to current sentence that will be used to extract surrounding words
p_Right - number of sentences right to current sentence that will be used to extract surrounding words
p_StopWords - stop word list
Method Detail

getCurrentInstanceID

public java.lang.String getCurrentInstanceID()
Description copied from interface: IFeatureExtractor
get the ID of current instance to be extracted

Specified by:
getCurrentInstanceID in interface IFeatureExtractor
Returns:
instance id

hasNext

public boolean hasNext()
Description copied from interface: IFeatureExtractor
whether has at least one more feature

Specified by:
hasNext in interface IFeatureExtractor
Returns:
has or not

next

public IFeature next()
Description copied from interface: IFeatureExtractor
get the next feature

Specified by:
next in interface IFeatureExtractor
Returns:
feature

restart

public boolean restart()
Description copied from interface: IFeatureExtractor
restart the iterator

Specified by:
restart in interface IFeatureExtractor
Returns:
success or not

setCorpus

public boolean setCorpus(ICorpus p_Corpus)
Description copied from interface: IFeatureExtractor
set corpus to be extracted

Specified by:
setCorpus in interface IFeatureExtractor
Parameters:
p_Corpus - corpus to be extracted
Returns:
set success or not

validIndex

protected boolean validIndex(int p_Index)
check the validity of index

Parameters:
p_Index - index
Returns:
valid or not

filter

public boolean filter(java.lang.String p_Word)
check whether word is in stop word list or contains no alphabet

Parameters:
p_Word - word
Returns:
true if it should be filtered, else false

setCurrentInstance

public boolean setCurrentInstance(int p_Index)
Description copied from interface: IFeatureExtractor
set the index of instance which to be extracted from corpus

Specified by:
setCurrentInstance in interface IFeatureExtractor
Parameters:
p_Index - instance index
Returns:
set success or not