Notes
Slide Show
Outline
1
Uncertainty
  • Chapter 13
2
Outline
  • Uncertainty
  • Probability
  • Syntax and Semantics
  • Inference
  • Independence and Bayes' Rule
3
Uncertainty
  • Let action At = leave for airport t minutes before flight
  • Will At get me there on time?


  • Problems:
    • partial observability (road state, other drivers' plans, etc.)
    • noisy sensors (traffic reports)
    • uncertainty in action outcomes (flat tire, etc.)
    • immense complexity of modeling and predicting traffic

  • Hence a purely logical approach either
    • risks falsehood: “A25 will get me there on time”, or
    • leads to conclusions that are too weak for decision making:

  • “A25 will get me there on time if there's no accident on the bridge and it doesn't rain and my tires remain intact etc etc.”


  • (A1440 might reasonably be said to get me there on time but I'd have to stay overnight in the airport …)
4
Methods for handling uncertainty
  • Default or nonmonotonic logic:
    • Assume my car does not have a flat tire
    • Assume A25 works unless contradicted by evidence
  • Issues: What assumptions are reasonable? How to handle contradiction?


  • Rules with fudge factors:
    • A25 |→0.3 get there on time
    • Sprinkler |→ 0.99 WetGrass
    • WetGrass |→ 0.7 Rain
  • Issues: Problems with combination, e.g., Sprinkler causes Rain??


  • Probability
    • Model agent's degree of belief
    • Given the available evidence,
    • A25 will get me there on time with probability 0.04
5
Probability
  • Probabilistic assertions summarize effects of
    • laziness: failure to enumerate exceptions, qualifications, etc.
    • ignorance: lack of relevant facts, initial conditions, etc.


  • Subjective probability:
  • Probabilities relate propositions to agent's own state of knowledge
  • e.g., P(A25 | no reported accidents) = 0.06


  • These are not assertions about the world


  • Probabilities of propositions change with new evidence:
  • e.g., P(A25 | no reported accidents, 5 a.m.) = 0.15
6
Making decisions under uncertainty
  • Suppose I believe the following:
    • P(A25 gets me there on time | …) = 0.04
    • P(A90 gets me there on time | …) = 0.70
    • P(A120 gets me there on time | …) = 0.95
    • P(A1440 gets me there on time | …) = 0.9999

  • Which action to choose?
  • Depends on my preferences for missing flight vs. time spent waiting, etc.
    • Utility theory is used to represent and infer preferences
    • Decision theory = probability theory + utility theory
7
Syntax
  • Basic element: random variable


  • Similar to propositional logic: possible worlds defined by assignment of values to random variables.
  • Boolean random variables
    • e.g., Cavity (do I have a cavity?)
  • Discrete random variables
    • e.g., Weather is one of <sunny,rainy,cloudy,snow>
  • Domain values must be exhaustive and mutually exclusive
  • Elementary proposition constructed by assignment of a value to a random variable: e.g., Weather = sunny, Cavity = false
     (abbreviated as Øcavity)
  • Complex propositions formed from elementary propositions and standard logical connectives e.g., Weather = sunny Ú Cavity = false
8
Syntax
  • Atomic event: A complete specification of the state of the world about which the agent is uncertain
    • E.g., if the world consists of only two Boolean variables Cavity and Toothache, then there are 4 distinct atomic events:
      • Cavity = false ÙToothache = false
      • Cavity = false Ù Toothache = true
      • Cavity = true Ù Toothache = false
      • Cavity = true Ù Toothache = true


  • Atomic events are mutually exclusive and exhaustive
9
Axioms of probability
  • For any propositions A, B
    • 0 ≤ P(A) ≤ 1
    • P(true) = 1 and P(false) = 0
    • P(A Ú B) = P(A) + P(B) - P(A Ù B)
10
Prior probability
  • Prior or unconditional probabilities of propositions
    • e.g., P(Cavity = true) = 0.1 and P(Weather = sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence


  • Probability distribution gives values for all possible assignments:
    • P(Weather) = <0.72,0.1,0.08,0.1> (normalized, i.e., sums to 1)
  • Joint probability distribution for a set of random variables gives the probability of every atomic event on those random variables
    • P(Weather,Cavity) = a 4 × 2 matrix of values:


  • Weather = sunny rainy cloudy snow
  • Cavity = true 0.144 0.02 0.016 0.02
  • Cavity = false 0.576 0.08 0.064 0.08


  • Every question about a domain can be answered by the joint distribution
11
Conditional probability
  • Conditional or posterior probabilities
    • e.g., P(cavity | toothache) = 0.8
    • i.e., given that toothache is all I know


  • Notation for conditional distributions:
    • P(cavity | toothache) = 2-element vector of 2-element vectors
  • If we know more, e.g., cavity is also given, then we have
    • P(cavity | toothache, cavity) = 1

  • New evidence may be irrelevant, allowing simplification, e.g.,
    • P(cavity | toothache. sunny) = P(cavity | toothache) = 0.8
    • This kind of inference, sanctioned by domain knowledge, is crucial
12
Conditional probability
  • Definition of conditional probability:
    • P(a | b) = P(a Ù b) / P(b) if  P(b) > 0


  • Product rule gives an alternative formulation:
    • P(a Ù b) = P(a | b) P(b) = P(b | a) P(a)


  • A general version holds for whole distributions, e.g.,
    • P(Weather,Cavity) = P(Weather | Cavity) P(Cavity)
    • (View as a set of 4 × 2 equations, not matrix mult.)


  • Chain rule is derived by successive application of product rule:
    • P(X1, …,Xn) = P(X1,...,Xn-1) P(Xn | X1,...,Xn-1)
    •                  = P(X1,...,Xn-2) P(Xn-1 | X1,...,Xn-2) P(Xn | X1,...,Xn-1)
    •                   = …
    •                   = πi= 1 P(Xi | X1, … ,Xi-1)
13
Inference by enumeration
  • Start with the joint probability distribution:




  • For any proposition φ, sum the atomic events where it is true: P(φ) = Σω:ω╞φ P(ω)
14
Inference by enumeration
  • Start with the joint probability distribution:




  • For any proposition φ, sum the atomic events where it is true: P(φ) = Σω:ω╞φ P(ω)
  • P(toothache) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2


15
Inference by enumeration
  • Start with the joint probability distribution:




  • For any proposition φ, sum the atomic events where it is true: P(φ) = Σω:ω╞φ P(ω)
  • P(toothache) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2
  • P(toothache Ú cavity) = 0.108 + 0.012 + 0.016 + 0.064 +
          0.072 + 0.008 = 0.28



16
Inference by enumeration
  • Start with the joint probability distribution:





  • Can also compute conditional probabilities:
  • P(Øcavity | toothache) = P(Øcavity Ù toothache)
  • P(toothache)
  • =       0.016+0.064
  •    0.108 + 0.012 + 0.016 + 0.064
  • = 0.4


17
Normalization
  • Denominator can be viewed as a normalization constant α


  • P(Cavity | toothache) = α · P(Cavity, toothache)
    • = α · [P(Cavity, toothache, catch) + P(Cavity, toothache, Øcatch)]
    • = α · [<0.108, 0.016> + <0.012, 0.064>]
    • = α · <0.12, 0.08> = <0.6, 0.4>


  • General idea: compute distribution on query variable by fixing evidence variables and summing over hidden variables
18
Inference by enumeration, contd.
  • Typically, we are interested in
  • the posterior joint distribution of the query variables Y
  • given specific values e for the evidence variables E


  • Let the hidden variables be H = X - Y - E


  • Then the required summation of joint entries is done by summing out the hidden variables:
    • P(Y | E = e) = αP(Y,E = e) = αΣhP(Y,E= e, H = h)

  • The terms in the summation are joint entries because Y, E and H together exhaust the set of random variables


  • Obvious problems:
    • Worst-case time complexity O(dn) where d is the largest arity
    • Space complexity O(dn) to store the joint distribution
    • How to find the numbers for O(dn) entries?
19
Independence
  • A and B are independent iff
  • P(A|B) = P(A)    or P(B|A) = P(B)     or P(A, B) = P(A) P(B)





    • P(Toothache, Catch, Cavity, Weather)
    • = P(Toothache, Catch, Cavity) P(Weather)


  • 32 entries reduced to 12 (8+4);
    for n independent biased coins, O(2n) →O(n)


  • Absolute independence powerful but rare


  • Dentistry is a large field with hundreds of variables, none of which are independent. What to do?
20
Conditional independence
  • P(Toothache, Cavity, Catch) has 23 – 1 = 7 independent entries


  • If I have a cavity, the probability that the probe catches in it doesn't depend on whether I have a toothache:
    • (1) P(catch | toothache, cavity) = P(catch | cavity)


  • The same independence holds if I haven't got a cavity:
    • (2) P(catch | toothache, Øcavity) = P(catch | Øcavity)
  • Catch is conditionally independent of Toothache given Cavity:
    • P(Catch | Toothache, Cavity) = P(Catch | Cavity)


  • Equivalent statements:
    • P(Toothache | Catch, Cavity) = P(Toothache | Cavity)
    • P(Toothache, Catch | Cavity) = P(Toothache | Cavity) P(Catch | Cavity)
21
Conditional independence contd.
  • Write out full joint distribution using chain rule:
  • P(Toothache, Catch, Cavity)
    • = P(Toothache | Catch, Cavity) P(Catch, Cavity)
    • = P(Toothache | Catch, Cavity) P(Catch | Cavity) P(Cavity)
    • = P(Toothache | Cavity) P(Catch | Cavity) P(Cavity)
  • I.e., 2 + 2 + 1 = 5 independent numbers


  • In most cases, the use of conditional independence reduces the size of the representation of the joint distribution from exponential in n to linear in n.


  • Conditional independence is our most basic and robust form of knowledge about uncertain environments.
22
Bayes' Rule
  • Product rule P(aÙb) = P(a | b) P(b) = P(b | a) P(a)
  • Þ Bayes' rule: P(a | b) = P(b | a) P(a) / P(b)


  • or in distribution form
  • P(Y|X) = P(X|Y) P(Y) / P(X) = αP(X|Y) P(Y)


  • Useful for assessing diagnostic probability from causal probability:
    • P(Cause|Effect) = P(Effect|Cause) P(Cause) / P(Effect)


    • E.g., let M be meningitis, S be stiff neck:
      • P(m|s) = P(s|m) P(m) / P(s) = 0.5 × 0.0002 / 0.05 = 0.0002
    • Note: posterior probability of meningitis still very small!
23
Bayes' Rule and conditional independence
  • P(Cavity | toothache Ù catch)
    • = α · P(toothache Ù catch | Cavity) P(Cavity)
    • = α · P(toothache | Cavity) P(catch | Cavity) P(Cavity)
  • This is an example of a naïve Bayes model:
    • P(Cause,Effect1, … ,Effectn) = P(Cause) πiP(Effecti|Cause)




  • Total number of parameters is linear in n
24
Summary
  • Probability is a rigorous formalism for uncertain knowledge
  • Joint probability distribution specifies probability of every atomic event
  • Queries can be answered by summing over atomic events
  • For nontrivial domains, we must find a way to reduce the joint size
  • Independence and conditional independence provide the tools