How new evidence revises current belief.

Kevin S. Van Horn (ksvhsoft@xmission.com)
Tue, 08 Jun 1999 22:31:44 -0600

We have two possible outcomes from pushing a button: H or T. David Poole
postulated three situations:

1) We've seen 0 occurrences of H and 0 occurrences of T.
2) We've seen 2 H's and 2 T's.
3) We've seen 500,000 H's and 500,000 T's.

He noted that in each case, if asked whether the next result would be H or T, we
would say "I don't know", and used this to argue for "I don't know" means "P(H)
= 0.5."

WANG Pei replied:

[...] what distinguishes ignorance and known probability is how new
evidence will revise current belief. After a new H (or T) is
observed, the above three cases become different. I'd like to know
how this difference can be captured by BN.

Oh, this is an easy one. Bayesians do this kind of thing all the time. Here's
your Bayesian network:

theta --+-------------> x_1
+-------------> x_2
+-------------> x_3
...
+-------------> x_500000
...

Theta is a real-valued variable. The various x_i are independent given the
value of theta, with identical distributions:

P(x_i = H | theta = R) = R
P(x_i = T | theta = R) = 1 - R

Theta itself must have some prior distribution over it, representing our state
of information as to what the long-term frequencies of H's and T's might be.

The prior over theta may look like some second-order probability -- a
probability of a probability -- but it is not. Theta is merely another
parameter.

One way of thinking of this is to imagine gathered together in one variable s
all the systematic factors that could influence whether we get an H or T when
pushing the button. Our state of information about any other factors is such
that we consider their values to be independent from one trial to the next. We
then construct a prior distribution for s. But we notice that the only thing
that interests us about s is

P(x_i = H | s = S).

So we group the different possible values S for s into equivalence classes
according to the conditional probabilities they give for "x_i = H", and label
each equivalence class by its defining conditional probability. We then define
theta to be the equivalence class to which s belongs, and we obtain the
distribution over theta from the distribution over s.

By the way, as to the general question of how new evidence revises current
belief, Bayes' rule

P(A | B, X) = P(A | X) * P(B | A, X) / P(B | X)

can be thought of simply as a rule for how a rational being should update its
beliefs in the face of new evidence. (Here B is the new evidence.)