1	Bayesian networks Chapter 14 Sections 1 – 2
2	Outline Syntax Semantics
3	Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions Syntax: a set of nodes, one per variable a directed, acyclic graph (link ≈ "directly influences") a conditional distribution for each node given its parents: P (X_i\| Parents (X_i)) In the simplest case, conditional distribution represented as a conditional probability table (CPT) giving the distribution over X_i for each combination of parent values
4	Example Topology of network encodes conditional independence assertions: Weather is independent of the other variables Toothache and Catch are conditionally independent given Cavity
5	Example I'm at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn't call. Sometimes it's set off by minor earthquakes. Is there a burglar? Variables: Burglary, Earthquake, Alarm, JohnCalls, MaryCalls Network topology reflects "causal" knowledge: A burglar can set the alarm off An earthquake can set the alarm off The alarm can cause Mary to call The alarm can cause John to call
6	Example contd.
7	Compactness A CPT for Boolean X_i with k Boolean parents has 2^k rows for the combinations of parent values Each row requires one number p for X_i = true (the number for X_i = false is just 1-p) If each variable has no more than k parents, the complete network requires O(n · 2^k) numbers I.e., grows linearly with n, vs. O(2ⁿ) for the full joint distribution For burglary net, 1 + 1 + 4 + 2 + 2 = 10 numbers (vs. 2⁵-1 = 31)
8	Semantics The full joint distribution is defined as the product of the local conditional distributions: P (X₁, … ,X_n) = π_{i = 1} P (X_i\| Parents(X_i)) e.g., P(j Ù m Ù a Ù Øb Ù Øe) = P (j \| a) P (m \| a) P (a \| Øb, Øe) P (Øb) P (Øe)
9	Constructing Bayesian networks 1. Choose an ordering of variables X₁, … ,X_n 2. For i = 1 to n add X_i to the network select parents from X₁, … ,X_i-1 such that P (X_i \| Parents(X_i)) = P (X_i \| X₁, ... X_i-1) This choice of parents guarantees: P (X₁, … ,X_n) = π_{i =1} P (X_i \| X₁, … , X_i-1) (chain rule) = π_{i =1}P (X_i\| Parents(X_i)) (by construction)
10	Example Suppose we choose the ordering M, J, A, B, E P(J \| M) = P(J)?
11	Example Suppose we choose the ordering M, J, A, B, E P(J \| M) = P(J)? No P(A \| J, M) = P(A \| J)? P(A \| J, M) = P(A)?
12	Example Suppose we choose the ordering M, J, A, B, E P(J \| M) = P(J)? No P(A \| J, M) = P(A \| J)? P(A \| J, M) = P(A)? No P(B \| A, J, M) = P(B \| A)? P(B \| A, J, M) = P(B)?
13	Example Suppose we choose the ordering M, J, A, B, E P(J \| M) = P(J)? No P(A \| J, M) = P(A \| J)? P(A \| J, M) = P(A)? No P(B \| A, J, M) = P(B \| A)? Yes P(B \| A, J, M) = P(B)? No P(E \| B, A ,J, M) = P(E \| A)? P(E \| B, A, J, M) = P(E \| A, B)?
14	Example Suppose we choose the ordering M, J, A, B, E P(J \| M) = P(J)? No P(A \| J, M) = P(A \| J)? P(A \| J, M) = P(A)? No P(B \| A, J, M) = P(B \| A)? Yes P(B \| A, J, M) = P(B)? No P(E \| B, A ,J, M) = P(E \| A)? No P(E \| B, A, J, M) = P(E \| A, B)? Yes
15	Example contd. Deciding conditional independence is hard in noncausal directions (Causal models and conditional independence seem hardwired for humans!) Network is less compact: 1 + 2 + 4 + 2 + 4 = 13 numbers needed
16	Summary Bayesian networks provide a natural representation for (causally induced) conditional independence Topology + CPTs = compact representation of joint distribution Generally easy for domain experts to construct