Various useful(?) probability equations and definitions


Conditional probability: P(A|B) = P(A,B) / P(B)


Joint Probability: P(A,B) = P(A|B) * P(B) = P(B|A) * P(A) 
  If you add another condition C, it becomes
    P(A,B|C) = P(A|B,C) * P(B|C) = P(B|A,C) * P(A|C). 
  Also note that P(A,B) is commutative, so: P(A,B) = P(B,A)


Logical operations:
 AND (notation is the upside down V)
  P(A AND B) = P(B|A) * P(A)
   ...product of the individual probabilities...
   if A and B are independent,
    then one has no effect on the other i.e.:  P(A|B) = P(A), P(B|A) = P(B)
   this becomes:  P(A AND B) = P(A) * P(B)
    
 OR (notation is the rightside up V)
  P(A OR B) = P(A) + P(B) - P(A AND B)
  sum of the individual probabilities...
   note: The purpose of the P(A AND B) is to subtract out
         what they have in common -- so it isn't counted twice.

 Also note that both are commutative, so: P(A AND B) = P(B AND A)


Bayes Rule: P(A|B) = P(B|A) * P(A) / P(B)
  Inverts the conditional: (A given B) to (B given A)
     terminology:
            P(A|B) -- posterior
            P(B|A) -- likelihood
            P(A)   -- prior
            P(B)   -- marginal likelihood, total probability, aka, normalizer

            A is the cause -- e.g. cancer,
            B is some evidence -- e.g. a test result


Normalization:
  Avoid calculating total probability by calculating true and false cases,
  which should always add to 1.0, then use their total as the normalizer.
  The normalizer is also the total probability under these conditions: P(B)!

   For Bayes Rule:
    P'(A|B)  = P(B|A) * P(A)    -- note these are not _real_ probabilities
    P'(A|~B) = P(~B|A) * P(A)   --  because they are not normalized by P(B)

    a = P'(A|B) + P'(A|~B)      -- sum em up

    P(A|B)   = P'(A|B) / a      -- now normalize
    P'(A|~B) = P'(A|~B) / a     --  note: a is the total probability P(B)


Total probability:
  The sum of the probabilities of a variable under all conditions:
   P(A) = P(A,B) + P(A,~B) = (P(A|B) * P(B)) + (P(A|~B) * P(~B))

   for multiple variables:
       P(A) = P(A,B,C) + P(A,B,~C) + P(A,~B,C) + P(A,~B,~C)


You can add conditions to any of these formulas:
For example:

  P(A|B) = P(A,B) / P(B)
   turns into:
  P(A|B,C) = P(A,B|C) / P(B|C) 
   or
  P(A|B,C,D) = P(A,B|C,D) / P(B|C,D)


  P(A|B) = P(B|A) * P(A) / P(B)
   turns into:
  P(A|B,C) = P(B|A,C) * P(A|C) / P(B|C)


  P(A) = P(A,B) + P(A,~B) = P(A|B) * P(B) + P(A|~B) * P(~B)
   turns into:
  P(A|C) = P(A,B|C) + P(A,~B|C) = (P(A|B,C) * P(B|C)) + (P(A|~B,C) * P(~B|C))


Parameter counts:
  A boolean variable that has K inputs has 2^K parameter values:

   Example: This needs 10 probability parameters
                A   B    P(A), P(B)      -- 2^0 = one parameter each
                 \ /
                  C      P(C|A,B)        -- 2^2 = four parameters
                 / \
                D   E    P(D|C), P(E|C)  -- 2^1 = two parameters each


Conditional Independence
 (The notation for "independence" is the upside down 'T'
  but I'm using the rightside up 'T' when necessary, so
    "A T C" is -- A is independant of C, and
    "A T C | B" is -- A is conditionally independant of C given B):

                B
               / \
              A   C
    A and C are conditionally independant given B.
    note: B and C don't also have to be conditionally independant given A.

        P(A|B,C) = P(A|B)
         works in both cases
          P(A|B,+C) = P(A|B)
            and
          P(A|B,-C) = P(A|B)


Conditional Dependence:
             A   C
              \ /
               B
          P(C|A)    = P(C)      -- A and C are independent
          P(C|B,A) != P(C|B,~A) -- but given B, C is cond dependant on A
    
    In the absence of B, A and C are independent:  A T C
    But if you know about B, then A and C become dependent, i.e.:
     given B, varying the value of C affects the probability of A


D-separation is the Study of Conditional Independence in Bayes Networks
    for "active triplets": variables are dependent
            (the middle variable is active in creating dependency)
    for "in-active triplets": variables are independent --
            (the middle variable is inactive, not creating dependency)

   Examples:
    Using conditional independence --
      For a simple directed chain of variables
        (note that B is in the middle):
                              A -> B -> C
      or for a directed tree of variables
        (note that B is at the top and middle...):
                                  B
                                 / \
                                A   C
     active: if the value of B is UNKNOWN, then A and C are dependent
     inactive: if the value of B is KNOWN, then A and C are independent
    
    But... conditional dependence reverses it --
      For this directed tree of variables
        (note that B is at the bottom and middle...):
                                A   C
                                 \ /
                                  B
     active: if the value of B is KNOWN, then A and C are dependent
     inactive: if the value of B is UNKNOWN, then A and C are independent
    And
      for this directed tree of variables
        (note that B and D are at the bottom and middle...):
                                A   C
                                 \ /
                                  B
                                  |
                                  D
      if we know D then it we don't need to know B:
       if we know a successor of B, we don't have to know B itself,
        because we can get knowledge of B from its successors.
     active: if the value of _D_ is KNOWN, then A and C are dependent
     inactive: if the value of _D_ is UNKNOWN, then A and C are independent