Notes
Slide Show
Outline
1
Understanding Documents via Concept Links



2
Outline
  • DUC 2005 System Task
  • Targeted Sentences
  • Our Approach
    • System Overview
    • Concept Link
    • Sentence Similarity
    • Sentence Ranker: A modified MMR
  • Evaluation
  • Conclusions
3
DUC 2005 System Task
  • Task Definition in [Amigo et al, 04]
    • … topic-oriented, informative multi-document summarization, … compressed version of a set of documents  …
  • Topic Creation Instructions
    • to formulate a topic out of interesting aspects
    • “At least 25 documents must each contribute some material to the answer” of a quest of the topic
  • Our view of the task
    • A general, and topic-oriented summary.
4
Targeted Sentences
  • Good DUC 2005 summary: an extract consists of sentences that
    • highly representative
    • highly relevant to the topic
      • General
      • Specific: named entities are favored
    • with minimal redundancy
5
System Overview
6
Concept Detection
7
Concept Link
  • There exists a Concept Link between each pair of similar concepts
  • Concept Similarity: maximal sense overlapping (Banerjee et al, 2003)
    • Consider all senses of each concept
      • Extended sense Sx:
      •      Synset + Gloss + hypernymy + meronymy set(1 level)
8
Concept Link Detection
  • 1) A year ago Mr Douglas Hurd foreign secretary became the first UK cabinet minister to visit Argentina since the 1982 Falkland islands conflict.
  • 2) Today Argentina gets out the red carpet for the UK Duke of York the first official royal visitor since the end of the Anglo Argentine Falklands war in 1982.
9
Concept Links between sentences
10
Sentence Similarity
  • Sum of “strength” of concept links
11
Sentence ranker
  • Original Weight: Representative Power
12
Sentence ranker
  • MMR modified
13
Evaluation: ROUGE
14
Evaluation: Pyramid
15
 
16
Experiments:
17
Conclusions:
    A simple system features
  • Concept Link: new way to calculate sentence similarity;
    • no chunker/parser involved
    • concept differs from NPs in Lexical Chain
  • Considering sentence similarity/relatedness via Concept Link:
    • Alleviate the influence of expression variations; (but might involve inaccurate sense guess)
    • Outperforms Word co-occurrence approach
  • Minimizing Redundancy via Modified MMR;
  • No extra heuristics involved.
18
Future work
  • Error analysis;
  • How to automatically set parameters;
  • Comparison with alternative Similarity Measures;
  • How about more knowledge (syntactic, semantic parsers …)?
  • …