|
1
|
|
|
2
|
- Uncertainty
- Probability
- Syntax and Semantics
- Inference
- Independence and Bayes' Rule
|
|
3
|
- Let action At = leave for airport t minutes before
flight
- Will At get me there on time?
- Problems:
- partial observability (road state, other drivers' plans, etc.)
- noisy sensors (traffic reports)
- uncertainty in action outcomes (flat tire, etc.)
- immense complexity of modeling and predicting traffic
- Hence a purely logical approach either
- risks falsehood: “A25 will get me there on time”, or
- leads to conclusions that are too weak for decision making:
- “A25 will get me there on time if there's no accident on the
bridge and it doesn't rain and my tires remain intact etc etc.”
- (A1440 might reasonably be said to get me there on time but
I'd have to stay overnight in the airport …)
|
|
4
|
- Default or nonmonotonic logic:
- Assume my car does not have a flat tire
- Assume A25 works unless contradicted by evidence
- Issues: What assumptions are reasonable? How to handle contradiction?
- Rules with fudge factors:
- A25 |→0.3 get there on time
- Sprinkler |→ 0.99 WetGrass
- WetGrass |→ 0.7 Rain
- Issues: Problems with combination, e.g., Sprinkler causes Rain??
- Probability
- Model agent's degree of belief
- Given the available evidence,
- A25 will get me there on time with probability 0.04
|
|
5
|
- Probabilistic assertions summarize effects of
- laziness: failure to enumerate exceptions, qualifications, etc.
- ignorance: lack of relevant facts, initial conditions, etc.
- Subjective probability:
- Probabilities relate propositions to agent's own state of knowledge
- e.g., P(A25 | no reported accidents) = 0.06
- These are not assertions about the world
- Probabilities of propositions change with new evidence:
- e.g., P(A25 | no reported accidents, 5 a.m.) = 0.15
|
|
6
|
- Suppose I believe the following:
- P(A25 gets me there on time | …) = 0.04
- P(A90 gets me there on time | …) = 0.70
- P(A120 gets me there on time | …) = 0.95
- P(A1440 gets me there on time | …) = 0.9999
- Which action to choose?
- Depends on my preferences for missing flight vs. time spent waiting,
etc.
- Utility theory is used to represent and infer preferences
- Decision theory = probability theory + utility theory
|
|
7
|
- Basic element: random variable
- Similar to propositional logic: possible worlds defined by assignment of
values to random variables.
- Boolean random variables
- e.g., Cavity (do I have a cavity?)
- Discrete random variables
- e.g., Weather is one of <sunny,rainy,cloudy,snow>
- Domain values must be exhaustive and mutually exclusive
- Elementary proposition constructed by assignment of a value to a random
variable: e.g., Weather = sunny, Cavity = false
(abbreviated as Øcavity)
- Complex propositions formed from elementary propositions and standard
logical connectives e.g., Weather = sunny Ú Cavity = false
|
|
8
|
- Atomic event: A complete specification of the state of the world about
which the agent is uncertain
- E.g., if the world consists of only two Boolean variables Cavity and Toothache,
then there are 4 distinct atomic events:
- Cavity = false ÙToothache =
false
- Cavity = false Ù Toothache
= true
- Cavity = true Ù Toothache =
false
- Cavity = true Ù Toothache =
true
- Atomic events are mutually exclusive and exhaustive
|
|
9
|
- For any propositions A, B
- 0 ≤ P(A) ≤ 1
- P(true) = 1 and P(false) = 0
- P(A Ú B) = P(A) + P(B) - P(A
Ù B)
|
|
10
|
- Prior or unconditional probabilities of propositions
- e.g., P(Cavity = true) = 0.1 and P(Weather = sunny) = 0.72 correspond
to belief prior to arrival of any (new) evidence
- Probability distribution gives values for all possible assignments:
- P(Weather) = <0.72,0.1,0.08,0.1> (normalized, i.e., sums to 1)
- Joint probability distribution for a set of random variables gives the
probability of every atomic event on those random variables
- P(Weather,Cavity) = a 4 × 2 matrix of values:
- Weather = sunny rainy cloudy snow
- Cavity = true 0.144 0.02 0.016 0.02
- Cavity = false 0.576 0.08 0.064 0.08
- Every question about a domain can be answered by the joint distribution
|
|
11
|
- Conditional or posterior probabilities
- e.g., P(cavity | toothache) = 0.8
- i.e., given that toothache is all I know
- Notation for conditional distributions:
- P(cavity | toothache) = 2-element vector of 2-element vectors
- If we know more, e.g., cavity is also given, then we have
- P(cavity | toothache, cavity) = 1
- New evidence may be irrelevant, allowing simplification, e.g.,
- P(cavity | toothache. sunny) = P(cavity | toothache) = 0.8
- This kind of inference, sanctioned by domain knowledge, is crucial
|
|
12
|
- Definition of conditional probability:
- P(a | b) = P(a Ù b) / P(b)
if P(b) > 0
- Product rule gives an alternative formulation:
- P(a Ù b) = P(a | b) P(b) =
P(b | a) P(a)
- A general version holds for whole distributions, e.g.,
- P(Weather,Cavity) = P(Weather | Cavity) P(Cavity)
- (View as a set of 4 × 2 equations, not matrix mult.)
- Chain rule is derived by successive application of product rule:
- P(X1, …,Xn) = P(X1,...,Xn-1)
P(Xn | X1,...,Xn-1)
- = P(X1,...,Xn-2)
P(Xn-1 | X1,...,Xn-2) P(Xn
| X1,...,Xn-1)
- = …
- = πi=
1 P(Xi | X1, … ,Xi-1)
|
|
13
|
- Start with the joint probability distribution:
- For any proposition φ, sum the atomic events where it is true: P(φ)
= Σω:ω╞φ P(ω)
|
|
14
|
- Start with the joint probability distribution:
- For any proposition φ, sum the atomic events where it is true: P(φ)
= Σω:ω╞φ P(ω)
- P(toothache) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2
|
|
15
|
- Start with the joint probability distribution:
- For any proposition φ, sum the atomic events where it is true: P(φ)
= Σω:ω╞φ P(ω)
- P(toothache) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2
- P(toothache Ú cavity) = 0.108
+ 0.012 + 0.016 + 0.064 +
0.072 + 0.008 =
0.28
|
|
16
|
- Start with the joint probability distribution:
- Can also compute conditional probabilities:
- P(Øcavity | toothache) = P(Øcavity Ù toothache)
- P(toothache)
- = 0.016+0.064
- 0.108 + 0.012 + 0.016 +
0.064
- = 0.4
|
|
17
|
- Denominator can be viewed as a normalization constant α
- P(Cavity | toothache) = α · P(Cavity, toothache)
- = α · [P(Cavity, toothache, catch) + P(Cavity, toothache, Øcatch)]
- = α · [<0.108, 0.016> + <0.012, 0.064>]
- = α · <0.12, 0.08> = <0.6, 0.4>
- General idea: compute distribution on query variable by fixing evidence
variables and summing over hidden variables
|
|
18
|
- Typically, we are interested in
- the posterior joint distribution of the query variables Y
- given specific values e for the evidence variables E
- Let the hidden variables be H = X - Y - E
- Then the required summation of joint entries is done by summing out the
hidden variables:
- P(Y | E = e) = αP(Y,E = e) = αΣhP(Y,E= e, H =
h)
- The terms in the summation are joint entries because Y, E and H together
exhaust the set of random variables
- Obvious problems:
- Worst-case time complexity O(dn) where d is the largest
arity
- Space complexity O(dn) to store the joint distribution
- How to find the numbers for O(dn) entries?
|
|
19
|
- A and B are independent iff
- P(A|B) = P(A) or P(B|A) = P(B) or P(A, B) = P(A) P(B)
- P(Toothache, Catch, Cavity, Weather)
- = P(Toothache, Catch, Cavity) P(Weather)
- 32 entries reduced to 12 (8+4);
for n independent biased coins, O(2n) →O(n)
- Absolute independence powerful but rare
- Dentistry is a large field with hundreds of variables, none of which are
independent. What to do?
|
|
20
|
- P(Toothache, Cavity, Catch) has 23 – 1 = 7 independent
entries
- If I have a cavity, the probability that the probe catches in it doesn't
depend on whether I have a toothache:
- (1) P(catch | toothache, cavity) = P(catch | cavity)
- The same independence holds if I haven't got a cavity:
- (2) P(catch | toothache, Øcavity)
= P(catch | Øcavity)
- Catch is conditionally independent of Toothache given Cavity:
- P(Catch | Toothache, Cavity) = P(Catch | Cavity)
- Equivalent statements:
- P(Toothache | Catch, Cavity) = P(Toothache | Cavity)
- P(Toothache, Catch | Cavity) = P(Toothache | Cavity) P(Catch | Cavity)
|
|
21
|
- Write out full joint distribution using chain rule:
- P(Toothache, Catch, Cavity)
- = P(Toothache | Catch, Cavity) P(Catch, Cavity)
- = P(Toothache | Catch, Cavity) P(Catch | Cavity) P(Cavity)
- = P(Toothache | Cavity) P(Catch | Cavity) P(Cavity)
- I.e., 2 + 2 + 1 = 5 independent numbers
- In most cases, the use of conditional independence reduces the size of
the representation of the joint distribution from exponential in n to
linear in n.
- Conditional independence is our most basic and robust form of knowledge
about uncertain environments.
|
|
22
|
- Product rule P(aÙb) = P(a |
b) P(b) = P(b | a) P(a)
- Þ Bayes' rule: P(a | b) =
P(b | a) P(a) / P(b)
- or in distribution form
- P(Y|X) = P(X|Y) P(Y) / P(X) = αP(X|Y) P(Y)
- Useful for assessing diagnostic probability from causal probability:
- P(Cause|Effect) = P(Effect|Cause) P(Cause) / P(Effect)
- E.g., let M be meningitis, S be stiff neck:
- P(m|s) = P(s|m) P(m) / P(s) = 0.5 × 0.0002 / 0.05 = 0.0002
- Note: posterior probability of meningitis still very small!
|
|
23
|
- P(Cavity | toothache Ù catch)
- = α · P(toothache Ù
catch | Cavity) P(Cavity)
- = α · P(toothache | Cavity) P(catch | Cavity) P(Cavity)
- This is an example of a naïve Bayes model:
- P(Cause,Effect1, … ,Effectn) = P(Cause) πiP(Effecti|Cause)
- Total number of parameters is linear in n
|
|
24
|
- Probability is a rigorous formalism for uncertain knowledge
- Joint probability distribution specifies probability of every atomic
event
- Queries can be answered by summing over atomic events
- For nontrivial domains, we must find a way to reduce the joint size
- Independence and conditional independence provide the tools
|