Fundamentals of Information Retrieval
Module 3               Min-Yen KAN
Evaluation Metrics*

References for Today
Witten, Moffat and Bell (99) Managing Gigabytes, Chapters 3-5.

Evaluation Contingency Table

Sensitivity, specificity,
 positive and negative predictive value

Evaluation Metrics
Precision = Positive Predictive Value
“ratio of the number of relevant documents retrieved over the total number of documents retrieved”
how much extra stuff did you get?
Recall = Sensitivity
“ratio of relevant documents retrieved for a given query over the number of relevant documents for that query in the database”
how much did you miss?

P/R: an example
Rank Decision R@r P@r
1 R 10% 100%
2 10% 50%
3 10% 33%
4 R 20% 50%
5 R 30% 60%
6 30% 50%
7 R 40% 57%
8 40% 50%
9 40% 44%
10 40% 40%
11 40% 36%
12 R 50% 42%
13 R 60% 46%
14 R 70% 50%
22 R 100% 45%

Precision / Recall
Interpolated precision
gives a non-increasing
curve
But doesn’t factor in
the size of the corpus
Previous example on a corpus of 25 docs = 40% precision
On a corpus of
2.5 M docs = also 40%

Factoring in size of a corpus
Look at how P/R or Sn/Sp varies as a function of rank:
Choose a number of different ranks and calculate P/R or Sn/Sp
Correspond to vertical lines on graphs at right
Plot Sn vs. 1-Sp to get points for ROC curve.  Interpolate curve.

ROC Curve
Look at the probability or rate of detection
What does the
diagonal represent?
How do we compare
ROC curves versus
each other?

Getting a single number
11 pt average
Average precision at each .1
interval in recall
Precision at recall point (% or absolute)
F Measure
Ratio of precision to recall:              Fb =
(e.g., F3 = weight precision heavier)
Area under ROC curve (Accuracy)
1 = perfect, .9 excellent, .5 worthless