Fundamentals of Information Retrieval

Module 3 Min-Yen KAN

Evaluation Metrics*

References for Today

Witten, Moffat and Bell (99) Managing Gigabytes, Chapters 3-5.

Sensitivity, specificity,
positive and negative predictive value

Evaluation Metrics

Precision = Positive Predictive Value

“ratio of the number of relevant documents retrieved over the total number of documents retrieved”

how much extra stuff did you get?

Recall = Sensitivity

“ratio of relevant documents retrieved for a given query over the number of relevant documents for that query in the database”

how much did you miss?

P/R: an example

Rank Decision R_@r P_@r

1 R 10% 100%

2 10% 50%

3 10% 33%

4 R 20% 50%

5 R 30% 60%

6 30% 50%

7 R 40% 57%

8 40% 50%

9 40% 44%

10 40% 40%

11 40% 36%

12 R 50% 42%

13 R 60% 46%

14 R 70% 50%

…

22 R 100% 45%

Precision / Recall

Interpolated precision
gives a non-increasing
curve

But doesn’t factor in
the size of the corpus

Previous example on a corpus of 25 docs = 40% precision

On a corpus of
2.5 M docs = also 40%

Factoring in size of a corpus

Look at how P/R or Sn/Sp varies as a function of rank:

Choose a number of different ranks and calculate P/R or Sn/Sp

Correspond to vertical lines on graphs at right

Plot Sn vs. 1-Sp to get points for ROC curve. Interpolate curve.

ROC Curve

Look at the probability or rate of detection

What does the
diagonal represent?

How do we compare
ROC curves versus
each other?

Getting a single number

11 pt average

Average precision at each .1
interval in recall

Precision at recall point (% or absolute)

F Measure

Ratio of precision to recall: F_b =

(e.g., F₃= weight precision heavier)

Area under ROC curve (Accuracy)

1 = perfect, .9 excellent, .5 worthless


	Witten, Moffat and Bell (99) Managing Gigabytes, Chapters 3-5.


	Precision = Positive Predictive Value
		“ratio of the number of relevant documents retrieved over the total number of documents retrieved”
		how much extra stuff did you get?
	Recall = Sensitivity
		“ratio of relevant documents retrieved for a given query over the number of relevant documents for that query in the database”
		how much did you miss?


	Rank Decision R_@r P_@r
	1 R 10% 100%
	2 10% 50%
	3 10% 33%
	4 R 20% 50%
	5 R 30% 60%
	6 30% 50%
	7 R 40% 57%
	8 40% 50%
	9 40% 44%
	10 40% 40%
	11 40% 36%
	12 R 50% 42%
	13 R 60% 46%
	14 R 70% 50%
	…
	22 R 100% 45%


	Interpolated precision gives a non-increasing curve

	But doesn’t factor in the size of the corpus

		Previous example on a corpus of 25 docs = 40% precision
		On a corpus of 2.5 M docs = also 40%


	Look at how P/R or Sn/Sp varies as a function of rank:

	Choose a number of different ranks and calculate P/R or Sn/Sp
		Correspond to vertical lines on graphs at right
		Plot Sn vs. 1-Sp to get points for ROC curve. Interpolate curve.


	Look at the probability or rate of detection

	What does the diagonal represent?

	How do we compare ROC curves versus each other?


	11 pt average
		Average precision at each .1 interval in recall

	Precision at recall point (% or absolute)

	F Measure
		Ratio of precision to recall: F_b =
		(e.g., F₃= weight precision heavier)


	Area under ROC curve (Accuracy)
		1 = perfect, .9 excellent, .5 worthless