Unit 3 -- Probability Oct 18, 2011 Video 3.1 Bayes Network Compact representation of a distribution over a large Joint Distribution. Composed of nodes, which are un/known events, i.e.: random variables. Connected by directed "arcs" (links) indicating that a parent node has a probabilistic influence over a child node. From the net, one can make observations (e.g.: car won't start and lights don't work) and compute the probability of particular hypothesies (causes). see image: bayesnet_intro.jpg Example: observation: car won't start hypothesies: bad battery, no oil, no gas, ... can make measurements: battery meter, gas gauge, dipstick some measurements may be affected by common cause: bad battery affects battery meter and gas gauge but not dipstick Bayes Networks used for Diagnostics, Prediction, Machine Learning in Finance, Google, Robotics Are components of Particle Filters, Hidden Markov Models, MDP, POMDP Kalman filters, ... Outline for the rest of unit: 1. Will use Discrete binary events 2. Probability review 3. Simple Bayes Networks 4. Conditional Independence 5. General Bayes Networks 6. D-seperation 7. Parameter counts 8. Next unit: Probabilistic Inference Videos 3.2-3.7 Probabilities for coin flip results are heads or tails, fair probability: P(H) = 0.5, P(T) = 0.5 probabilities add up to 1.0, if P(H) = 0.5, then P(T) = 1 - 0.5 independence: X independent of Y : P(X) * P(Y) = P(X,Y) the probability of the marginals -- P(X) times P(Y) equals the joint probability -- P(X,Y) so for multiple independent trials, probabilities multiply: P( H, H, H ) = 1/2 * 1/2 * 1/2 = 1/8 (.125 chance of getting three heads in a row) note: this is combinatorics -- three things that can be in 2 states = 2^3 possibilities flip four times: Xi is the result of ith flip where Xi={H,T} and Pi(H)=0.5 for any flip (where the result of the ith flip can be H or T and the probability of getting heads is 0.5 for any flip) P( X1 = X2 = X3 = X4 )? -- What is the probability that all flips will give the SAME result: all H or all T? two ways to get it: HHHH, TTTT each with P=1/16 so: P= 1/16 + 1/16 = 1/8 or .125 P( X1,X2,X3,X4 ) >= 3H? -- What is the probability in four flips that we will get at least 3 H? five ways to get it each with P=1/16 so: P= 5/16 or .3125 ( HHHH, HHHT, HHTH, HTHH, THHH ) note: for combinations, see the Khan unit: http://www.khanacademy.org/video/exactly-three-heads-in-five-flips?playlist=Probability Unit 3.3 Video 3.8 Dependence two flips of different coins where the first is fair and the result: Heads decides that we flip a 90% Heads coin Tails decides that we flip a 80% Tails coin what is the probability that the second flip comes up Heads? P(X1=H) = 1/2 if X1==H then P(X2=H | X1=H) = 0.9 else X1==T then P(X2=T | X1=T) = 0.8 where the last equation reads: The probability that flip-2 is Tails, GIVEN that flip-1 was Tails, equals 0.8 the answer is 0.55 ... because: P(X2=H) = P(X2=H | X1=H) * P(X1=H) + [prob that X1=H and X2=H] P(X2=H | X1=T) * P(X1=T) [plus prob that X1=T and X2=H] The probability that X2 will be heads equals the probability that X2 is heads given that X1 was heads plus the probability that X2 is heads given that X1 was tails note that the probability that the 80% Tails coin comes up heads is 20% i.e.: 1 - P(T) so the numbers are: 0.9 * 0.5 + (90% coin used 50% of the time) 0.2 * 0.5 (20% coin used 50% of the time) = 0.45 + 0.1 = 0.55 Lessons Total probability: P(Y) = {sum over i of} P(Y | X=i) * P(X=i) the total probability of Y equals the sum of (the probability of Y given X=i) times (the probability of X=i) for all values of i Negation of probability: P(~X | Y) = 1 - P(X | Y) the probability of NOT X given Y equals one minus the probability of X given Y Probability of X given NOT Y? No, you can't negate the variable you are conditioning on P(X | ~Y) ?= 1 - P(X | Y) -- NO!!! Video 3.10-3.12 Weather, notes: D1 means Day-1 from introduction, assuming only two states {sunny,rainy} quiz 1: P(D1=sunny) = 0.9 P(D2=sunny | D1=sunny) = 0.8 P(D2=rainy | D1=sunny) = what? from Negation: P(~X | Y) = 1 - P(X | Y) P(X|Y) = P(D2=sunny | D1=sunny) = 0.8 rainy = ~sunny P(D2=rainy | D1=sunny) = 1 - 0.8 = 0.2 quiz2: P(D2=sunny | D1=rainy) = 0.6 P(D2=rainy | D1=rainy) = what? again from Negation: P(~X | Y) = 1 - P(X | Y) P(X|Y) = P(D2=sunny | D1=rainy) = 0.6 rainy = ~sunny P(D2=rainy | D1=rainy) = 1 - 0.6 = 0.4 quiz3: P(D2=sunny) = what? we know: a. P(D1=sunny) = 0.9 b. P(D1=rainy) = 1 - 0.9 = 0.1 c. P(D2=sunny | D1=sunny) = 0.8 d. P(D2=sunny | D1=rainy) = 0.6 so: P(D2=sunny) = c * a + d * b = P(D2=s | D1=s) * P(D1=s) + P(D2=s | D1=r) * P(D1=r) = 0.8 * 0.9 + 0.6 * 0.1 = 0.72 + 0.06 = 0.78 the probability that D2=sunny is (the prob that D2=s given that D1=s) times (the prob that D1=s) plus (the prob that D2=s given that D1=r) times (the prob that D1=r) Then using same dynamics, so replace D1,D2 with D3,D2 P(D3=sunny) = what? we know: a. P(D2=sunny) = 0.78 b. P(D2=rainy) = 1 - 0.78 = 0.22 c. P(D3=sunny | D2=sunny) = 0.8 (note: these stay the same!!!) d. P(D3=sunny | D2=rainy) = 0.6 ( """ from day to day!!! ) so: P(D3=sunny) = c * a + d * b = P(D3=s | D2=s) * P(D2=s) + P(D3=s | D2=r) * P(D2=r) = 0.8 * 0.78 + 0.6 * 0.22 = 0.624 + 0.132 = 0.756 Videos 3.13-3.16 Cancer Probability of having (or not) this kind of cancer -- P( C ) = 0.01 P( ~C ) = 0.99 Probability of getting positive or negative test result for this cancer -- P( + | C ) = 0.9 P( - | C ) = 0.1 Probability of the test incorrectly being positive (false positive) -- P( + | ~C ) = 0.2 and inversly, correctly being negative (true negatives) -- P( - | ~C ) = 0.8 To start with, what are the "joint probabilities") of: 1. positive test and having cancer -- true positive P( +, C ) = 0.009 explain: P( + | C ) * P( C ) = 0.9 * 0.01 = 0.009 [prob of + test given you have cancer times prob of having cancer] 2. negative test and having cancer -- false negative P( -, C ) = 0.001 explain: P( - | C ) * P( C ) = 0.1 * 0.01 = 0.001 [prob of - test given you have cancer times prob of having cancer] 3. positive test and NOT having cancer -- false positive P( +, ~C ) = 0.198 explain: P( + | ~C ) * P( ~C ) = 0.2 * 0.99 = 0.198 [prob of + test given you don't have cancer times prob of not cancer] 4. negative test and NOT having cancer -- true negative P( -, ~C ) = 0.792 explain: P( - | ~C ) * P( ~C ) = 0.8 * 0.99 = 0.792 [prob of - test given you don't have cancer times prob of not cancer] So, what's the probability that you have the cancer if you get a postive test: P( C | + ) = 0.043 explain: there are two ways to get a positive test: true positives: P( +, C ) = 0.009 false positives: P( +, ~C ) = 0.198 take the ratio of true postive to total positives: 0.009 / (0.009 + 0.198) ~= 0.043 note that the total .207 is P(+) interesting point -- because the PRIOR probability of having cancer is so small (0.01) the chances of having false positive test are _much_ higher (0.198) than the chances of having a true positive test (.009) so the positive test only slightly raises the POSTERIOR probability (0.043) Unit 3.7 Video 3.17 Bayes Rule -- Rev Thomas Bayes 18th century --->>> P(A|B) = P(B|A) * P(A) / P(B) <<<--- P(A|B) -- posterior P(B|A) -- likelihood P(A) -- prior P(B) -- marginal likelihood A is the cause -- cancer, B is some evidence -- a test result P(A|B) "the posterior" is the "diagnostic direction" -- we want to know how likely the cause is, given the evidence equals P(B|A) "the likelihood" is the "causal direction" -- how likely is the evidence, given the cause? times P(A) "the prior" -- how likely is the cause? divided by P(B) "the marginal" how likely is the evidence? note: for P(B) see "total probability" above: P(B) = {sum over a of} P(B | A=a) * P(A=a) in the cancer case above: P( C | + ) = P( + | C ) * P( C ) / P( + ) the probability of having cancer given a positive test equals the probability of a positive test given you have cancer times the probability of having cancer divided by the probability of a positive test to repeat the numbers from above: a. P( C ) = 0.01 -- prior b. P( ~C ) = 0.99 -- negative of prior c. P( + | C ) = 0.9 -- likelihood d. P( + | ~C ) = 0.2 -- negative likelihood and: e. P(+) = -- "marginal likelihood" or "total probability" c * a + d * b = P(+|C) * P(C) + P(+|~C) * P(~C) 0.9 * 0.01 + 0.2 * 0.99 = .009 + .198 = 0.207 so: P(C|+) = -- posterior c * a / e = 0.9 * 0.01 / 0.207 = 0.043 (!!!the same value we got before!!!) Unit 3.7a Video 3.18 Bayes Network -- Bayes Rule Graphically (A) ---> (B) where A is cause and B is effect (A=cancer, B=test result) A is not observable but B is we know: P(A) -- the probability of the cause, cancer = 1% P(B|A) and P(B|~A) -- the prob of effect given each value of cause Causal reasoning: P(B|A) and P(B|~A) how likely is the effect given the cause? we want to know: Diagnostic reasoning: P(A|B) and P(A|~B) how likely is the cause given the effect? There are 3 parameters: P(A), P(B|A), P(B|~A) see image: bayesrule.jpg More Complex Bayes Networks Bayes Rule (again): P(A|B) = P(B|A) * P(A) / P(B) P(B|A) -- likelihood P(A) -- prior P(B) -- marginal likelihood P(B|A) * P(A) -- (likelihood and prior) are easy to compute P(B) -- (marginal likelihood) not always so easy, but at least its just a function of B (no A's involved) P(B) is called the "normalizer" Computing Bayes Rule -- using "normalization" we can find the complementary event, not A given B -- P(~A|B) = P(B|~A) * P(~A) / P(B) and we know that the two need to add to 1 -- P(A|B) + P(~A|B) = 1 leave out the P(B) normalizer in the two to get "pseudo-probabilities", i.e., P' is not a "real" probability at this point -- P'(A|B) = P(B|A) * P(A) P'(~A|B) = P(B|~A) * P(~A) to get a real probability, P' can be multiplied by some normalizer 'a' -- (...he uses etta, the book uses alpha...) realP = a * P'(A|B) real~P = a * P'(~A|B) and 'a' is one over the sum of the two P' values (because they eventually need to add up to 1) -- a = 1 / ( P'(A|B) + P'(~A|B) ) (note: see that 'a' = 1/P(B) as well!) Unit 3.8a Videos 3.20-3.21 Two Test Cancer Two tests with net like: C / \ T1 T2 with same probabilities for either test: priors negations (1 - prior) P(C) = 0.01 P(~C) = 0.99 P(+|C) = 0.9 P(-|C) = 0.1 P(-|~C) = 0.8 P(+|~C) = 0.2 what is the probability that you have Cancer if both tests are positive? P( C | T1=+ T2=+ ) = P( C | ++ ) = 0.1698 because you multiply the probabilities: P(C|++) = P(C) * P(T1+|C) * P(T2+|C) / P(++) P(~C|++) = P(~C) * P(T1+|~C) * P(T2+|~C) / P(++) and doing it with normalizing avoids needing P(++): using P(C)=0.01, P(~C)=0.99, P(+|C)=0.9, P(+|~C)=0.2 prior * T1+ * T2+ = P' / a = P(C|++) C 0.01 * 0.9 * 0.9 = 0.0081 / 0.0477 = 0.1698 ~C 0.99 * 0.2 * 0.2 = 0.0396 / 0.0477 = 0.8301 total: 0.0477 -- note: this is joint P(++) what is the probability that you have Cancer if one test is + and one -? P( C | T1=+ T2=- ) = P( C | +- ) = 0.0056 using P(C)=0.091, P(+|C)=0.9, P(+|~C)=0.2, P(-|C)=0.1, P(-|~C)=0.8 prior * T1+ * T2- = P' / a = P(C|+-) C 0.01 * 0.9 * 0.1 = 0.0009 / 0.1593 = 0.0056 ~C 0.99 * 0.2 * 0.8 = 0.1584 / 0.1593 = 0.9943 total: 0.1593 Unit 3.9 Videos 3.22-3.23 Conditional Independence for the network above, C is the "hidden variable": C / \ T1 T2 and T1 and T2 are assumed to be "conditionally independent" meaning: knowing C and T1 will not change P(T2), i.e.: P( T2 | C, T1 ) = P( T2 | C ) the probability of T2 given C and T1 equals the probability of T2 given just C In the network, the directional arrows from C to Tn "cutoff" the Tn's from each other and they are "conditionally independent". T2 is "ci" of T1 _only_ if we actually know C For this directed network: A / \ B C given A, then B and C are "conditionally independent": B T C | A (the upside down T thing is "independence"... so B T C is "absolute independence" and B T C | A is "conditional independence" ) if you don't know about A can they still be "ci"? NO -- because knowing B gives you information about A which in turn influences the results of C given the same Cancer net and probabilities, what is: P( T2=+ | T1=+ ) the probability that T2 will be positive if we know T1 was positive and that C is the parent of both? this is the "total probability" of T2=+ given T1=+ i.e. add up the probabilities of T2=+ for (+/-C conditioned on T1=+): P( T2=+ | T1=+, C ) * P( C | T1=+ ) + P( T2=+ | T1=+, ~C ) * P( ~C | T1=+ ) due to conditional independence we can remove T1=+ from the first term -- given we know C, knowledge of T1 gives no more information about T2 and using the liklihood values above: P( T2=+ | T1=+, C ) can reduce to P( T2=+ | C ) = 0.9 P( T2=+ | T1=+, ~C ) can reduce to P( T2=+ | ~C ) = 0.2 and we already did the bayes calculation of the 2nd term inverse values: P( C | T1=+ ) = 0.043 P( ~C | T1=+ ) = 0.957 so we get: P(+T2|C) * P(C|+T1) + P(+T2|~C) * (P~C|+T1) 0.9 * 0.043 + 0.2 * 0.957 = .2301 before the total probability of getting a positive test was 0.207 so the probability of the second test=+ is now slightly higher. Unit 3.9d Video 3.24 Absolute and Conditional Independence (note: using 'T' instead of the upside-down version in the video) For this directed network (note: the video turned the letters around from before...) C / \ A B Does absolute independence imply conditional independence? NOT! A T B -> A T B | C even with absolute ind, things might not be cond ind (...explained later.... see Conditional Dependence below) Does conditional independence imply absolute independence? NOT! A T B | C -> A T B because C is the "intermediary" which influences both results from above: knowing A gives you information about C and knowing C in turn influences the results of B Unit 3.10 Video 3.25 Confounding Cause (Different Type of Bayes Networks) For this directed network: S R \ / H Two independent hidden causes are confounded in one observation -- Sunny, Raise -> Happy example values: P(S) = 0.7 P(R) = 0.01 and: P( H | S, R ) = 1.0 P( H | ~S, R ) = 0.9 P( H | S,~R ) = 0.7 P( H | ~S,~R ) = 0.1 ("a perfectly fine specification of a probability distribution" schip note: we have the prob of each cause and the joint prob of of all combinations of cause on result) what's the probability of a Raise given that it's Sunny: P( R | S ) = ??? == 0.01 I didn't get an explanation video, but I presume it's because R and S are defined as absolutely independent events... Unit 3.11 Video 26-28 Explaining Away: If we know that we are happy then sunny weather can "explain away" the cause of the happiness. If we also know that it is sunny then it's less likely that we received a raise. or If you see an effect that has multiple causes, Then seeing one of the causes can "explain away" other potential causes. Using the same network and probabilities as above -- What's the probability of a raise given that I'm happy and it's sunny? P( R | H, S ) = ??? = 0.0142 Using Bayes Rule, the above inverts to: (...I don't understand this at all...) P( H | R, S ) * P( R | S ) / P( H | S ) The probability of Happy given Raise and Sunny times The total probability of Raise given Sunny divided by The total probability of Happy given Sunny then we can change P(R|S) to P(R) because it's independent and expand P(H|S) to -----ugh---- the total probability: P(H|R,S) * P(R) + P(H|~R,S) * P(~R) so: P(H|R,S) * P(R) / P(H|R,S) * P(R) + P(H|~R,S) * P(~R) 1.0 * 0.01 / 1.0 * 0.01 + 0.7 * 0.99 ~= 0.0142 Then...what's the probability of a raise given only that I'm happy? P( R | H ) = ??? = 0.0185 to do it: Use Bayes Rule to invert the equation: P( R | H ) = P( H | R ) * P( R ) / P( H ) calculate the total probability of happiness P(H) -- i.e.: sum over all four combinations of parent values to H: (note: P(S,R) = P(S) * P(R) = probability of (S and R) ) P(H) = P(H| S, R) * P( S, R) + P(H|~S, R) * P(~S, R) + P(H| S,~R) * P( S,~R) + P(H|~S,~R) * P(~S,~R) 1.0 * (0.7 * 0.01) + 0.9 * (0.3 * 0.01) + 0.7 * (0.7 * 0.99) + 0.1 * (0.3 * 0.99) = 0.5245 calculate the total probability of happiness given raise P(H|R) -- i.e.: sum over all two combinations of S parent values to H: P(H|R) = P(H| S, R) * P( S ) + P(H|~S, R) * P(~S ) 1.0 * (0.7) + 0.9 * (0.3) = 0.97 Plug the numbers into the Bayes invertion equation: P( R | H ) = P( H | R ) * P( R ) / P( H ) P(R|H) = 0.97 * 0.01 / 0.5245 = 0.0185 The lesson: P(R|H,S) = 0.0142 P(R|H) = 0.0185 If you know ST is happy AND it's sunny, the sunny part "explains away" the happy result so it's LESS likely he got a raise. But if you don't know it's sunny (AND he's still happy) then it's more likely he got the raise... One last question -- what's the probability of a raise given happy and NOT sunny? P(R|H,~S) = ??? = 0.0833 by Bayes inversion this is: P(H|R,~S) * P(R|~S) / P(H|~S) then we can change P(R|~S) to P(R) because it's independent and expand P(H|~S) to the total probability: P(H|R,~S) * P(R) + P(H|~R,~S) * P(~R) so: P(H|R,~S) * P(R) / P(H|R,~S) * P(R) + P(H|~R,~S) * P(~R) 0.9 * 0.01 / 0.9 * 0.01 + 0.1 * 0.99 ~= 0.0833 Unit 3.11f Video 3.29 Conditional Dependence From before: P( R | H, S ) = 0.0142 -- if he's happy and it's sunny the prob of having gotten a raise is only slightly higher P(R|H,~S) = 0.0833 -- if he's happy and it's NOT sunny the prob of having gotten a raise is way higher So...in the given Bayes Net, S and R are independent but H adds a dependence between them. Repeating for this directed network: S R \ / H P(R|H,S) = 0.0142 P(R|S) = P(R) = 0.01 P(R|H,~S) = 0.0833 In the absence of H, R and S are independent: R T S But if you know about H, then R and S become dependent: P(R|H,S) != P(R|H,~S) -- 0.0142 != 0.0833 given H, varying the value of S affects the probability of R --> So: Independence does NOT imply conditional independence. <-- see answer in Unit 3.9d Video 3.24 Unit 3.12 Video 3.30-3.33 General Bayes Networks (note that the parameter calculations only work for binary in/out states ( it's more complicated, as in the final exam, with more states ( and I think the formula is: InStates * (OutStates - 1) ( because you would normally have (InStates * OutStates) combinations ( of input/output probabilities but since you can calculate the 1-total ( you can use one fewer element in each outstate table... ( see cloudyCPT.jpg for the full tables ) Bayes Networks define distributions over graphs of random variables. Instead of enumerating all possibilities of combinations of the variables, the network is defined by probabilities that are inherient to each node A B P(A), P(B) -- one parameter each \ / C P(C|A,B) -- four parameters / \ D E P(D|C), P(E|C) -- two parameters each The joint probability represented by a Bayes Network is the product of all the prob over each node where each node's prob is only conditioned on it's incoming arcs. P(A,B,C,D,E) = P(A) * P(B) * P(C|A,B) * P(D|C) * P(E|C) The advantage is that it reduces the number of factors needed to spec the full joint probability: 5 variables need 2^5-1 = 31 factors to spec every combination, but the Bayes Net shown needs only 10 -- P(A) -- 1 ( single probability of T,F ) P(B) -- 1 P(C|A,B) -- 4 ( A or B can be T,F so 4 possibilities ) P(D|C) -- 2 ( C can be T,F so 2 possibilities ) P(E|C) -- 2 So...using Bayes Nets you get a representation that can scale significantly better when you get to large networks. !!Key Advantage!! Quiz: How many values needed to spec this Bayes Net: ??? = 13 A P(A) -- 1 /|\ B C D P(B|A), P(C|A), P(D|A) -- 2+2+2 = 6 | \| E F P(E|B), P(F|C,D) -- 2+4 = 6 Any (boolean) variable that has K inputs has 2^K values. Video 3.33 Value of a Network Quiz: How many values needed to spec this Bayes Net: ??? = 19 A B C P(A), P(B), P(C) -- 1+1+1 = 3 \ | /| D | P(D|A,B.C) -- 2^3 = 8 / | \| E F G P(E|D), P(F|D), P(G|C,D) -- 2^1+2^1+2^2 = 8 Quiz: How many parameters in the Bayes Net from the begining: ??? = 47 ...note that the Full JPD is 2^16-1 = 65535 parameters... see image: bayesnet_intro.jpg row 1: 3 w/ 0 in = 3 row 2: 1 w/ 1 in, 1 w/ 2 in = 6 row 3: 1 w/ 1 in, 1 w/ 2 in, 4 w/ 0 in = 10 row 4: 2 w/ 1 in, 2 w/ 2 in, 1 w/ 4 in 2*2^1 + 2*2^2 + 1*2^4 = 28 Unit 3.13 Video 3.34-36 D-Separation or Reachability Quiz: Bayes network: A / \ B D / \ C E is C independent of A y,n ? n -- A influences C by way of B is C ind of A given B (C|A,B) y,n ? y -- knowing B makes A not matter is C independent of D y,n ? n -- A influences both C and D is C ind of D given A (C|D,A) y,n ? y -- knowing A makes D not matter is E ind of C given D (E|C,D) y,n ? y -- knowing D makes E not matter Rule: two variables are independent if they are not linked by just unknowns (variables are independent if they are linked by a known variable) schip: a known variable on a direct path between variables cuts off the link between the two sides -- it makes the link IN-ACTIVE and the variables INDEPENDENT. in the example: if you know B, everything "downstream" of B is independent of everything upstream, so C is ind of A given B, and E is ind of C given B -- works both ways... but knowing B doesn't make A and E independent. Quiz: Bayes network: A B \ / C / \ D E is A ind of E y,n ? n -- only unknowns in-between is A ind of E given B y,n ? n -- only unknowns in-between is A ind of E given C y,n ? y -- C makes A not matter is A ind of B y,n ? y -- no incoming arcs is A ind of B given C y,n ? n -- "the explain away effect" "conditional dependence" condional dependence (see the sunny,raise,happy example above): if we know A can cause C then it's less likely that B caused C and vice versa, if A is false and C true then B is more likely D-separation is the General Study of Conditional Independence in Bayes Networks active triplets: make variables dependent in-active triplets: make variables independent for a chain of variables (note that B is in the middle): A -> B -> C or for this directed tree of variables (note that B is at the top and middle...): B / \ A C active: if the value of B is unknown, then A and C are dependent inactive: if the value of B is known, then A and C are independent But... for this directed tree of variables, conditional dependence reverses it -- (note that B is at the bottom and middle...): A C \ / B active: if the value of B is KNOWN, then A and C are dependent inactive: if the value of B is UNKNOWN, then A and C are independent AND for this directed tree of variables (note that B and D are at the bottom and middle...): A C \ / B | D if we know D then it we don't need to know B. if we know a successor of B, we don't have to know B itself, because we can get knowledge of B from its successors. active: if the value of _D_ is KNOWN, then A and C are dependent inactive: if the value of _D_ is UNKNOWN, then A and C are independent see: bayes_independence.jpg Quiz: see bayes_indQuiz.jpg for the network is F ind of A y,n ? y -- A & F dep on D, but we don't know D is F ind of A given D y,n ? n -- we know D, the successors of B & E is F ind of A given G y,n ? n -- we know G, the successor of D is F ind of A given H y,n ? y -- no known variables in-between Video 3.37 Congratulations! main points: Graph structure of Bayes Networks Compact representation Conditional Independence ...He hopes we enjoyed the Unit...oy...