1	Bayes Rule
2	Bayes' Rule Product rule P(aÙb) = P(a \| b) P(b) = P(b \| a) P(a) Þ Bayes' rule: P(a \| b) = P(b \| a) P(a) / P(b) or in distribution form P(Y\|X) = P(X\|Y) P(Y) / P(X) = αP(X\|Y) P(Y) Useful for assessing diagnostic probability from causal probability: P(Cause\|Effect) = P(Effect\|Cause) P(Cause) / P(Effect) E.g., let M be meningitis, S be stiff neck: P(m\|s) = P(s\|m) P(m) / P(s) = 0.5 × 0.0002 / 0.05 = 0.0002 Note: posterior probability of meningitis still very small!
3	Bayes' Rule and conditional independence P(Cavity \| toothache Ù catch) = α · P(toothache Ù catch \| Cavity) P(Cavity) = α · P(toothache \| Cavity) P(catch \| Cavity) P(Cavity) This is an example of a naïve Bayes model: P(Cause,Effect₁, … ,Effect_n) = P(Cause) π_iP(Effect_i\|Cause) Total number of parameters is linear in n
4	Naïve Bayes Classifier Calculate most probable function value V_map = argmax P(v_j\| a₁,a₂, … , a_n) = argmax P(a₁,a₂, … , a_n\| v_j) P(v_j) P(a₁,a₂, … , a_n) = argmax P(a₁,a₂, … , a_n\| v_j) P(v_j) Naïve assumption: P(a₁,a₂, … , a_n) = P(a₁)P(a₂) … P(a_n)
5	Naïve Bayes Algorithm NaïveBayesLearn(examples) For each target value v_j P’(v_j) ← estimate P(v_j) For each attribute value a_i of each attribute a P’(a_i\|v_j) ← estimate P(a_i\|v_j) ClassfyingNewInstance(x) v_nb= argmax P’(v_j) Π P’(a_i\|v_j)
6	An Example (due to MIT’s open coursework slides)
7	An Example (due to MIT’s open coursework slides)