1
2	Patterns Are Everywhere
3	Two Methods of Pattern Matching Hard Matching Rule induction Generalizing training instances into regular expression represented rules Performing slot by slot matching Soft Matching Hidden Markov Models (HMM) Soft pattern matching for definitional QA (Cui et al., 2004)
4	Hard Matching Lack of flexibility in matching Can’t deal with gaps between rules and test instances
5	Soft Matching (Cui et al., 2004) …… The channel Iqra is owned by the … …… severance packages, known as golden parachutes, included …… A battery is a cell which can provide electricity.
6	Weakness of Current Soft Matching Models Ad-hoc in model parameter estimation Cui et al., 2004: Lack of formalization Not generic Task specific topology for HMM Difficult to port to other applications
7	In This Paper … Propose two generic soft pattern models Bigram model Formalization of our previous model Profile Hidden Markov Model (PHMM) More complex model that handles gaps better Parameter estimation by EM algorithm Evaluations on definitional question answering Can be applied to other pattern matching applications
8	Outline Overview of Definitional QA Bigram Soft Pattern Model PHMM Soft Pattern Model Evaluations Conclusions
9	Outline Overview of Definitional QA Bigram Soft Pattern Model PHMM Soft Pattern Model Evaluations Conclusions
10	Definitional QA To answer questions like “Who is Gunter Blobel” or “What is Wicca”. Why evaluating on definition sentence retrieval? Diverse patterns Definitional QA is one of the least explored areas in QA
11	Pattern Matching for Definitional QA
12	Outline Overview of Definitional QA Bigram Soft Pattern Model PHMM Soft Pattern Model Evaluations Conclusions
13	Bigram Soft Pattern Model To estimate the interpolation mixture weight λ Expectation Maximization (EM) algorithm Count words and general tags separately Avoid overwhelming frequency count of general tags
14	Bigram Model in Dealing with Gaps Bigram model can deal with gaps Unseen tokens have small smoothing probabilities in specific positions
15	Outline Overview of Definitional QA Bigram Soft Pattern Model PHMM Soft Pattern Model Evaluations Conclusions
16	PHMM Soft Pattern Model Better solution for dealing with gaps Left to right Hidden Markov Model with insertion and deletion states
17	How PHMM Deals with Gaps Calculating generative probability given a test instance Find the most probable path by Viterbi algorithm Efficient calculation by forward-backward algorithm
18	Estimation of PHMM Estimated by Baum-Welch algorithm Using the most probable path during training Random or uniform initialization may lead to unsatisfactory model Extreme diversity of definition patterns and not sufficient training data Assume path should favor match states over others P( token \| Match ) > P ( token \| Insertion ) Using smoothed ML estimates
19	Outline Overview of Definitional QA Bigram Soft Pattern Model PHMM Soft Pattern Model Evaluations Overall performance evaluation Sensitivity to model length Sensitivity to size of training data Conclusions
20	Evaluation Setup Data set Test data: TREC-13 question answering task data AQUAINT corpus and 64 definition questions with answers Training data 761 manually labeled definition sentences from TREC-12 question answering task data Comparison systems State-of-the-art manually constructed patterns Most comprehensive manually constructed patterns to our knowledge Previously proposed soft pattern in Cui et al., 2004
21	Evaluation Metrics Manually checked F₃measure Based on essential/acceptable answer nuggets NR – proportion of returned essential answer nuggets NP – penalty to longer answers Weighting NR 3 times as NP Subject to inconsistent scoring among assessors Automatic ROUGE score Gold standard: sentences containing answer nuggets Counting the trigrams shared in the gold standard and system answers ROUGE-3-ALL (R3A) and ROUGE-3-ESSENTIAL (R3E)
22	Performance Evaluation Soft pattern matching outperforms hard matching Bigram and PHMM models perform better than the previously proposed soft pattern method Previous soft pattern method is not optimized Manual F3 scores correlate well with automatic R3 scores
23	Sensitivity to Model Length PHMM is less sensitive to model length PHMM may handle longer sequences
24	Sensitivity to the Amount of Training Data PHMM requires more training data to improve
25	Discussions on Both Models Capture the same information The importance of a token’s position in the context of the search term The sequential order of tokens Different in complexity Bigram model Simplified Markov model with each token as a state Captures token sequential information by bigram probabilities PHMM model More complex – aggregated token sequential information by hidden state transition probabilities Experimental results show PHMM is less sensitive to model length PHMM may benefit more by using more training data
26	Outline Overview of Definitional QA Bigram Soft Pattern Model PHMM Soft Pattern Model Evaluations Conclusions
27	Conclusions Proposed Bigram model and PHMM model Generic in the forms Systematic parameter estimation by EM algorithm These two models can be applied to other applications using surface text patterns Soft patterns have been applied to information extraction (Xiao et al., 2004) PHMM is more flexible in dealing with gaps, but requires more training data to converge
28	Q & A