Bayes Classifiers
Maximum Likelihood K= 1 Laplacian Smoothing
word class count|S count|~S P(w|S) P(w|~S) word class count|S count|~S P(w|S) P(w|~S)
totals: 12 9 15 words: 24 totals: 12 21 27 words: 48
data:       data:      
  offer 1 0 0.111111 0   offer 2 1 0.095238 0.037037
  is 1 1 0.111111 0.066667   is 2 2 0.095238 0.074074
  secret 3 1 0.333333 0.066667   secret 4 2 0.190476 0.074074
  click 1 0 0.111111 0   click 2 1 0.095238 0.037037
  sports 1 5 0.111111 0.333333   sports 2 6 0.095238 0.222222
  link 2 0 0.222222 0   link 3 1 0.142857 0.037037
  play 0 2 0 0.133333   play 1 3 0.047619 0.111111
  today 0 2 0 0.133333   today 1 3 0.047619 0.111111
  went 0 1 0 0.066667   went 1 2 0.047619 0.074074
  event 0 1 0 0.066667   event 1 2 0.047619 0.074074
  costs 0 1 0 0.066667   costs 1 2 0.047619 0.074074
  money 0 1 0 0.066667   money 1 2 0.047619 0.074074
  messages: 3 5 total M: 8   messages: 4 6 total M: 10
for some reason he uses the P(S) = P(~S) = P(S) = P(~S) =
message rather than word ratio!!? 0.375 0.625 0.4 0.6
Message: "secret","is","secret" "today","is","secret"
bayes: P(S | "secret","is","secret") = P(S | "today","is","secret") =
P("secret","is","secret" | S)  * P(S) / P("secret","is","secret") P("today","is","secret" | S)  * P(S) / P("today","is","secret")
P(M | S): P("secret","is","secret" | S) = P(s|S)*P(i|S)*P(s|S) * P(S) = P("today","is","secret" | S) = P(t|S)*P(i|S)*P(s|S) * P(S) =
0.0046296 0.0003455
P(M | ~S): P("secret","is","secret" | ~S) = P(s|~S)*P(i|~S)*P(s|~S) * P(~S) = P("today","is","secret" | ~S) = P(t|~S)*P(i|~S)*P(s|~S) * P(~S) =
0.0001852 0.0003658
totalP(M): P(M | S)  + P(M | ~S) = P(M | S)  + P(M | ~S) =
0.0048148 0.0007113
P(S | M): P(S | "secret","is","secret") = P(S | "today","is","secret") =
0.9615385 0.4857571