The objective of this miniature is to illustrate the applications of:
Before presenting the above applications, here is some backgrounders on the topic.
Information retrieval systems usually involve taking in a query from the user and displaying the results from the database that matches the query. For example, an information retrieval system is LINC as used by the NUS Libraries. Researchers often want to measure the performance of an information retrieval systems, to see whether the query results really answer the queries. We define the following terms:
Of course, we want to construct a system that retrieves as many relevant items as possible, and excludes as many non-relevant items as possible. There are two standard measures that are used to measure system performance:
The individual measures taken alone have their own deficiencies. A system that returns only a single result that is known to be relevant to the query is definitely going to have a perfect precision of 1, and a system that returns every single item from the database is going to have a prefect recall of 1. Hence, both measures have to be taken together, and for convenience sake, it is easier to represent the combined results as a single number. This number is known as F-score, which is defined as:
The F-score is a value from 0 to 1 inclusive. Note that beta is a parameter to the F-score, and higher beta values would favour recall over precision. Often, F(1)-score is used, that is, precision and recall is given equal weights.
The following is a complete header file source listing for a class that computes precision, recall and F-score. This source listing aims to demonstrate the application of the concepts listed in the objective. Please study the code to gain the most from this miniature.
// FScore.h
// Computes the precision, recall and F-score
#ifndef __FSCORE_H__
#define __FSCORE_H__
#include <set>
#include <iterator>
#include <algorithm>
using namespace std;
// Class definition
template <class T>
class FScore {
private:
set<T> retrieve;
set<T> relevant;
set<T> common;
bool dirty;
void calcCommon();
public:
FScore();
FScore(const set<T>& retrieve, const set<T>& relevant);
void clear();
void addRetrieve(const T& item);
void addRelevant(const T& item);
unsigned int getNumRetrieve();
unsigned int getNumRelevant();
unsigned int getNumCommon();
double getPrecision();
double getRecall();
double getFScore();
double getFScore(const unsigned int beta);
};
// Class implementation
template <class T>
FScore<T>::FScore() {
this->dirty = false;
}
template <class T>
FScore<T>::FScore(const set<T>& retrieve, const set<T>& relevant) {
this->retrieve = retrieve;
this->relevant = relevant;
this->dirty = true;
}
template <class T>
void FScore<T>::calcCommon() {
if (this->dirty) {
this->common.clear();
set_intersection(
this->retrieve.begin(), this->retrieve.end(),
this->relevant.begin(), this->relevant.end(),
inserter(this->common, this->common.begin()));
this->dirty = false;
}
}
template <class T>
void FScore<T>::clear() {
this->retrieve.clear();
this->relevant.clear();
this->common.clear();
this->dirty = false;
}
template <class T>
void FScore<T>::addRetrieve(const T& item) {
this->retrieve.insert(item);
this->dirty = true;
}
template <class T>
void FScore<T>::addRelevant(const T& item) {
this->relevant.insert(item);
this->dirty = true;
}
template <class T>
unsigned int FScore<T>::getNumRetrieve() {
return this->retrieve.size();
}
template <class T>
unsigned int FScore<T>::getNumRelevant() {
return this->relevant.size();
}
template <class T>
unsigned int FScore<T>::getNumCommon() {
calcCommon();
return this->common.size();
}
template <class T>
double FScore<T>::getPrecision() {
calcCommon();
return (double)this->common.size() / (double)this->retrieve.size();
}
template <class T>
double FScore<T>::getRecall() {
calcCommon();
return (double)this->common.size() / (double)this->relevant.size();
}
template <class T>
double FScore<T>::getFScore() {
return getFScore(1U);
}
template <class T>
double FScore<T>::getFScore(const unsigned int beta) {
double p;
double r;
p = getPrecision();
r = getRecall();
return ((double)(beta * beta + 1U) * (p * r)) /
(((double)(beta * beta) * p) + r);
}
#endif /* __FSCORE_H__ */
A few notes: