Interesting Problems in Probability [from E.T Jaynes]

I decided to pick off where I left off as regards the study of probability and I returned to the trusty E.T Jaynes’ “Probability Theory- The Logic of Science”.

I’d like to solve two interesting problems I encountered in the second chapter (Which for some reason I couldn’t solve before, but can do now). The trick to solving the problems really just involve using simple Set theory visualizations.

The first problem is stated as follows

Is it possible to find a general formula for \large{p(C|A + B)} , analogous to (2.66), from the product and sum rules? if so derive it; if not explain why this cannot be done.

For more context, I’ll quickly explain the equation 2.66.

Express the probability \large{p(A + B | C)} as the possible sum of individual conditional probabilities.

The most obvious step would be to use the DeMorgan’s law in the sum. giving

\large{p(A + B | C)} = 1 -  \large{p(\overline{A + B} | C)} = 1 - \large{p(\overline{AB} | C)}

Based on the simple relations : \large{\overline{A + B}} =  \large{\overline{AB}} and \large{p(X | C)} = 1 - \large{p(\overline{X} | C)}

\large{ 1 - p(\overline{AB} | C)  = 1 - p(\overline{A}|C)p(\overline{B}|\overline{A}C)}

\large{= 1 - p(\overline{A}|C)(1 - p(B|\overline{A}C)}

expanding the bracket yields

\large{= (1 - p(\overline{A}|C))   + p(\overline{A}|C)p(B|\overline{A}C)}

\large{= p(A|C)  + p(\overline{A}B|C)}

\large{= p(A|C)   + p(B|C)p(\overline{A}|BC)}

\large{= p(A|C)  + p(B|C)(1 - p(A|BC)}

\large{= p(A|C)  + p(B|C) -  p(B|C)p(A|BC)}

\large{= p(A|C)  + p(B|C) -  p(AB|C)}

finally the relation in terms on the individual conditional probabilities is given as

\large{ p(AB|C) = p(A|C) + p(B|C) - p(AB|C)}

The derivation sure as hell looks tedious if your logic math is rusty just as mine, but the following intuitive explanation makes it much easier.

Imagine the probability sample space given below as a set of discrete points in the space

The probability space is clearly the set of all the discrete points, the green space encompasses the equally likely out comes under C while the two ellipses represent the equally likely outcomes under A and B respectively.

Now if we’re given the conditional information that event C has happened, then it is clear from the diagram that some of the event A are likely and some of the event B are also Likely. Intuitively, it is clear that the probability of event A or B occuring given that event C has conditionally occured is simply the sume of the discrete points that overlap C in A and B

This can be expressed mathematically as \large{ p(AB|C) = p(A|C) + p(B|C)} which is quite close to the mathematically derived formula. The third term seems to be missing since event A and B have no overlapping points in the sample space.

Now reconsider a similar space where events A and B have overlapping points in the sample space

if we just summed the discrete points in the event spaces then the two overlapping discrete points would be counted twice. Hence we would need to subtract it once when we add the event B after counting event A.

Hence \large{= p(A|C) + p(B|C) - p(AB|C)}. This intuitive explanation strongly agrees with set theory conclusions using venn diagrams and makes conclusion (2.66) easier to visualize. Now back to the question

Is it possible to find a general formula for \large{p(C|A + B)} , analogous to (2.66), from the product and sum rules? if so derive it; if not explain why this cannot be done.

The question kind of reverses our initial intuitive conclusion. i.e can we compute the conditional probability for C given that either event A or B occurs. From the diagram we see that the general formula isnot intuitively possible since we’ll need the conditional probabilites for : \large{= p(A|\overline{C}),  p(B|\overline{C}) ,  p(AB|\overline{C})}, all of which cannot be obtained from boolean identities and must be provided before hand.

As can be seen from the diagram below, the event space for C not occuring goes beyond the sample spaces of events A and B. Hence no general formula can be given

The second question is a lot simpler holding the basic rules in mind

Now suppose that we have a set of propositions \large{\{ A_1, \hdots , A_n\}} which on information \large{X} are mututally exclusive : \large{ p(A_i A_j | X) = p(A_i | X) \delta_{ij} } show that \large{ p(C | [A_1 + \hdots + A_n]X) } is a weighted average of the separate plausibilites \large{ p(C | A_i X) } :

\large{ p(C | [A_1 + \hdots + A_n]X)  = \dfrac{\Sigma_i p( A_i | X) p(C | A_i X)}{\Sigma_i p(A_i| X)}}

First the set of propositions is given as a single proposition, hence:

\large{ p(C | [A_1 + \hdots + A_n]X)  = p(C | A_T X) } where \large{[A_1 + \hdots + A_n] =  A_T  }

remembering that

\large{  p(C A_T | X)    =p(A_T |X)p(C |A_T X)   }


\large{p(C |A_T X) = \dfrac{ p(C A_T | X)}{p(A_T |X)} }

\large{p(C |[A_1 + \hdots + A_n] X) = \dfrac{ p( [A_1 + \hdots + A_n] C| X)}{p([A_1 + \hdots + A_n] |X)} }

\large{p(C |[A_1 + \hdots + A_n] X) = \dfrac{ p( [A_1C + \hdots + A_nC]| X)}{p([A_1 + \hdots + A_n] |X)} }

Keeping in mind that all the propositions in A are mutually exclusive i.e no two propositions are simultaneously true , ergo no intersection occurs in the set theoretic sence. Hence

\large{p(C |[A_1 + \hdots + A_n] X) = \dfrac{ p(A_1C|X) + \hdots +p( A_nC| X)}{p(A_1|X) + \hdots +p( A_n| X)} }

The numerator terms can be expanded further

\large{p(C |[A_1 + \hdots + A_n] X) = \dfrac{ p(A_1|X)p(C|A_1X) + \hdots +p(A_n|X)p(C|A_nX)}{p(A_1|X) + \hdots +p( A_n| X)} } = \dfrac{\Sigma_i p( A_i | X) p(C | A_i X)}{\Sigma_i p(A_i| X)}

Pretty easy and straightforward. a good way to intuitively visualize this is given below

The reason for the weighted average becomes very clear as teh event space C doesn’t subsume all of the indepent propositions completely.

This was fun!

Leave a Reply

Your email address will not be published. Required fields are marked *