Re: Just one message on random variables

Kevin S. Van Horn (ksvhsoft@xmission.com)
Fri, 19 Jun 1998 09:03:38 -0600

Paul Krause wrote:

> The difficulty I have with all this is that in the context of learning,
> your "Bayesian probabilities" gradually turn into physical probabilities.

With the possible exception of quantum mechanics, there is no such thing
as a "physical" probability. Probabilities are not physical properties;
they are summaries of our state of knowledge in the face of missing
information. For example, the probabilities used in thermodynamics are
merely a reflection of our incomplete knowledge (the fact that we do not
know the precision location and momentum of every particle in the system,
having instead only ensemble averages).

Another example: When you say that a flipped coin has a 1/2 probability of
landing heads up, this is a statement about *you*, not a statement about the
coin. There is no physical property of the coin that gives it a probability
1/2 (or 3/4, or 1/3, etc.) of landing heads up. In fact, if you were to
measure the initial location, orientation, angular momentum, linear momentum,
and mass distribution of the coin sufficiently accurately, you could predict
exactly which way it would fall. We say there is a 1/2 probability of heads
simply because the way the coin falls usually has a very sensitive dependence
on the above initial conditions, and we have insufficient information to
make reliable predictions about how it will fall. Note, however, that it
is possible to flip a coin in a manner that appears to tumble about just
like a "normal" coin flip, but with highly predictable and repeatable
results.

Even for quantum mechanics, there are some interpretations that treat its
indeterminacy as resulting from missing information.

> Analogously, I start with the premise that probability is a measure and
> you get the value as best you can.

I have to disagree with this premise. Probability is not a physical
property or quantity that can be measured. I challenge you to show me a
probability that is an actual physical property, and not just a statement
of one's state of knowledge.

> [...] when do your
> Bayesian probabilities suddenly turn into physical probabilities?

Let's look at an example that may clarify matters. Suppose that we have N
different coin-flipping machines, for some large value of N. Because of the
sensitivity to initial conditions, the presence of vibrations, and myriad
other factors not under our control and unobserved by us, we can't predict
the outcome of any one coin toss. However, after running each machine for a
very long time, we find that machine i produces heads with a relative
frequency of i/N, and there is no discernible pattern linking separate coin
tosses on the same or different machines. So we summarize this information
by saying that each coin toss is independent of the others, with each coin
toss from machine i having a probability i/N of heads. I believe these are
what you would consider "physical" probabilities.

The machines are identical in appearance, and one is rolled out. We don't
know which machine it is. We are going to do a number of coin flips to try
to determine which machine it is. One might say we are trying to learn the
probability of heads.

Initially, we have P(machine i) = 1/N for all i, since our state of
knowledge about the identity of the machine is symmetrical between the
various possibilities. (I.e., we have a uniform prior). The data we
collect from coin flipping will produce, via Bayes' rule, a posterior
distribution over the machines -- a probability P(machine i | data) for
each i. What is the probability of a head on the next coin flip? It is
P(H | data) = SUM i:: P(machine i | data) P(H | machine i)
= SUM i:: i/N * P(machine i | data)

If it is machine j producing the coin flips, then as we get more and more
data, P(machine j | data) will converge to 1, and P(H | data) will converge
to j/N = P(H | machine j). But there is no point at which "Bayesian"
probabilities turn into "physical" probabilities; the two probabilities
(which are, in fact, both "Bayesian" probabilities) remain distinct because
they have different conditioning information. What you would call a
"physical" probability is
P(H | machine j).
What you would call a "Bayesian" probability is
P(H | data).

Kevin S. Van Horn
ksvhsoft@xmission.com