Notes
Slide Show
Outline
1
 
2
Patterns Are Everywhere
3
Two Methods of Pattern Matching
  • Hard Matching
    • Rule induction
    • Generalizing training instances into regular expression represented rules
    • Performing slot by slot matching
  • Soft Matching
    • Hidden Markov Models (HMM)
    • Soft pattern matching for definitional QA (Cui et al., 2004)
4
Hard Matching
  • Lack of flexibility in matching
    • Can’t deal with gaps between rules and test instances
5
Soft Matching (Cui et al., 2004)
  • …… The channel Iqra is owned by the …
       …… severance packages, known as golden parachutes, included ……
       A battery is a cell which can provide electricity.
6
Weakness of Current Soft Matching Models
  • Ad-hoc in model parameter estimation
    • Cui et al., 2004: Lack of formalization
  • Not generic
    • Task specific topology for HMM
      • Difficult to port to other applications

7
In This Paper …
  • Propose two generic soft pattern models
    • Bigram model
      • Formalization of our previous model
    • Profile Hidden Markov Model (PHMM)
      • More complex model that handles gaps better
  • Parameter estimation by EM algorithm
  • Evaluations on definitional question answering
    • Can be applied to other pattern matching applications


8
Outline
  • Overview of Definitional QA
  • Bigram Soft Pattern Model
  • PHMM Soft Pattern Model
  • Evaluations
  • Conclusions


9
Outline
  • Overview of Definitional QA
  • Bigram Soft Pattern Model
  • PHMM Soft Pattern Model
  • Evaluations
  • Conclusions


10
Definitional QA
  • To answer questions like “Who is Gunter Blobel” or “What is Wicca”.
  • Why evaluating on definition sentence retrieval?
    • Diverse patterns
    • Definitional QA is one of the least explored areas in QA

11
Pattern Matching for Definitional QA
12
Outline
  • Overview of Definitional QA
  • Bigram Soft Pattern Model
  • PHMM Soft Pattern Model
  • Evaluations
  • Conclusions


13
Bigram Soft Pattern Model
  • To estimate the interpolation mixture weight λ
    • Expectation Maximization (EM) algorithm
  • Count words and general tags separately
    • Avoid overwhelming frequency count of general tags
14
Bigram Model in Dealing with Gaps
  • Bigram model can deal with gaps
    • Unseen tokens have small smoothing probabilities in specific positions

15
Outline
  • Overview of Definitional QA
  • Bigram Soft Pattern Model
  • PHMM Soft Pattern Model
  • Evaluations
  • Conclusions


16
PHMM Soft Pattern Model
  • Better solution for dealing with gaps
  • Left to right Hidden Markov Model with insertion and deletion states
17
How PHMM Deals with Gaps
  • Calculating generative probability given a test instance
    • Find the most probable path by Viterbi algorithm
    • Efficient calculation by forward-backward algorithm
18
Estimation of PHMM
  • Estimated by Baum-Welch algorithm
    • Using the most probable path during training
  • Random or uniform initialization may lead to unsatisfactory model
    • Extreme diversity of definition patterns and not sufficient training data
    • Assume path should favor match states over others
      • P( token | Match ) > P ( token | Insertion )
    • Using smoothed ML estimates
19
Outline
  • Overview of Definitional QA
  • Bigram Soft Pattern Model
  • PHMM Soft Pattern Model
  • Evaluations
    • Overall performance evaluation
    • Sensitivity to model length
    • Sensitivity to size of training data
  • Conclusions


20
Evaluation Setup
  • Data set
    • Test data: TREC-13 question answering task data
      • AQUAINT corpus and 64 definition questions with answers
    • Training data
      • 761 manually labeled definition sentences from TREC-12 question answering task data
  • Comparison systems
    • State-of-the-art manually constructed patterns
      • Most comprehensive manually constructed patterns to our knowledge
    • Previously proposed soft pattern in Cui et al., 2004



21
Evaluation Metrics
  • Manually checked F3 measure
    • Based on essential/acceptable answer nuggets
      • NR – proportion of returned essential answer nuggets
      • NP – penalty to longer answers
      • Weighting NR 3 times as NP
    • Subject to inconsistent scoring among assessors


  • Automatic ROUGE score
    • Gold standard: sentences containing answer nuggets
    • Counting the trigrams shared in the gold standard and system answers
    • ROUGE-3-ALL (R3A) and ROUGE-3-ESSENTIAL (R3E)
22
Performance Evaluation
  • Soft pattern matching outperforms hard matching
  • Bigram and PHMM models perform better than the previously proposed soft pattern method
    • Previous soft pattern method is not optimized
  • Manual F3 scores correlate well with automatic R3 scores


23
Sensitivity to Model Length
  • PHMM is less sensitive to model length
  • PHMM may handle longer sequences
24
Sensitivity to the Amount of Training Data
  • PHMM requires more training data to improve


25
Discussions on Both Models
  • Capture the same information
    • The importance of a token’s position in the context of the search term
    • The sequential order of tokens
  • Different in complexity
    • Bigram model
      • Simplified Markov model with each token as a state
      • Captures token sequential information by bigram probabilities
    • PHMM model
      • More complex – aggregated token sequential information by hidden state transition probabilities
  • Experimental results show
    • PHMM is less sensitive to model length
    • PHMM may benefit more by using more training data
26
Outline
  • Overview of Definitional QA
  • Bigram Soft Pattern Model
  • PHMM Soft Pattern Model
  • Evaluations
  • Conclusions


27
Conclusions
  • Proposed Bigram model and PHMM model
    • Generic in the forms
    • Systematic parameter estimation by EM algorithm
  • These two models can be applied to other applications using surface text patterns
    • Soft patterns have been applied to information extraction (Xiao et al., 2004)
    • PHMM is more flexible in dealing with gaps, but requires more training data to converge
28
Q & A