Learning from Observations

Information Content

Entropy measures purity of sets of examples

Normally denoted H(x)

Or as information content: the less you need to

know (to determine class of new case), the more

information you have

With two classes (P,N):

IC(S) = - (p/t) log₂(p/t) - (n/t) log₂(n/t)

E.g., p=9, n=5;

IC([9,5]) = - (9/14) log₂(9/14) - (5/14) log₂(5/14)

= 0.940

Also, IC([14,0])=0; IC([7,7])=1