Re: Problem with the degree of belief interpretation

Kevin S. Van Horn (kevin@localhost.localdomain)
Sun, 26 Jul 1998 11:49:12 -0600

David,

I finally found the time to read your paper "Reconciling Bayesian and
Non-Bayesian Analysis." Here are my comments.

1. Your paper assumes throughout that there do in fact exist such
things as objective, physical probabilities. Correct me if I'm wrong,
but this assumption is crucial to the conclusions of your paper. Take
away this assumption, and most of your conclusions can no longer be
supported. (Note that I'm not debating the correctness of your
theorems; I'm arguing against your interpretation of them.)

I've presented arguments against the notion of physical probabilities
on this list, and in favor of a view of probability distributions as
representing a state of knowledge. Rather than repeat those
arguments, I'm going to email you copies of those posts. I would
be interested in hearing your response to those arguments.

Let me also mention that the view of probabilities as expressing a
state of knowledge is the first really satisfactory definition of what
probabilities mean that I have ever encountered. You appear to be a
frequentist, from what you have written. Yet the usual frequentist
attempts to define what what a probability means, in terms of
long-range frequencies, have always suffered from operational
difficulties and a certain circularity. Offer me your best
frequentist definition of what it means to say that "the probability
of A given B is p", and I bet I can find problems with it.

The one case where I have allowed that physical probabilities *might*
exist is in QM. But even this case is in dispute among physicists,
many of whom do not accept the Copenhagen interpretation of QM. It's
been a long time since I did physics, but if I understand correctly,
the many-worlds interpretation of QM does not posit indeterministic
physical behavior and physical probabilities. A few years ago I read
about another view of QM in which the wave function is viewed as an
actual physical entity, and the apparently stochastic nature of QM is
due to incomplete information. And statistical physicist E. T. Jaynes
argues against the notion of physical probabilities in QM in Chapter
10 of his book, _Probability Theory: The Logic of Science_, under the
heading "But What About Quantum Theory". (This chapter can be found at
ftp://bayes.wustl.edu/pub/Jaynes/book.probability.theory/postscript/cc10k.ps.
The entire chapter is an argument against physical probabilities.)

But even if QM does in fact exhibit physical probabilities, this would
be of little practical importance in most statistical problems, in
which QM effects are negligible, and the physics of the situation is,
to a high degree of accuracy, entirely deterministic. For example,
consider games of chance. The outcome of a die roll, or a spin of the
roulette wheel, can be calculated to a high degree of accuracy from
the initial conditions, using the laws of physics. So why do we think
of these as probabilistic phenomena? It is because they are highly
sensitive to initial conditions that are both difficult to control and
to observe. The uncertainty in the outcome is a result of the
limitations of our knowledge, not of any indeterministic physical
process.

And this is generally the case in statistical problems. They are
characterized by a lack of sufficient information (and, in many cases,
computing power) to compute the outcome. Your paper refers to the
problem of predicting the change in the value of the Dow Jones
average. This depends on a vast number of different
variables -- including the utility functions and beliefs of many
different people -- that we have no hope of ever measuring. So how
are we to deal with these limitations we face, the uncertainty as to
the actual state of the system we are dealing with? Cox's axioms tell
us that we have no choice but to represent our partial state of
knowledge as a probability distribution, and then use the laws of
probability to derive our partial state of knowledge in the outcome.
Otherwise we are faced with logical inconsistencies or various pathologies.

Even in those problems where QM effects are significant, often we
still cannot use the QM probabilities directly, because to do so
requires precise information about the quantum state of the system,
which we do not have. So here again, we must use probabilities to
represent our incomplete state of knowledge as to the system's quantum
state, and compute the probability of the outcome as
P(outcome is o | I) =
(SUM s:: P(quantum state is s | I) * P(outcome is o | s, I))
where I is the information we have available.

2. There is some indication in your paper that our views can be
reconciled. You write such things as "if one is very sure of the
prior", and "if we have little information concerning P(t)", which to
my way of thinking are nonsensical phrases. However, if you were to
concentrate on conditional probabilities, and instead of speaking of
trying to determine the "real prior" you spoke of collecting
additional information on which to condition our probabilities, then we
would find more common ground.

With this in mind, I find the following statement from your paper
especially interesting:

> Finally, note that there might well be a way to embed the
> reasonableness / desiderata arguments often used by dob Bayesians to
> set priors inside a complete mathematical framework (e.g., there
> might be a framework which maps any (!) I to a unique prior
> distribution). If we had such a framework, *then* one might claim
> that such reasonableness arguments are a well-principled way to
> assign probabilities.

But this is exactly what researchers such as Jaynes have been doing!
To paraphrase Jaynes, it has only recently become clear that the
process of turning the information at hand into a probability
distribution is fully half of probability theory, and has not received
the attention it deserves. I am aware of two main tools that have
been developed: symmetry considerations, and the principle of maximum
entropy.

If you tell me that you have a card in your hand, that it is either
red or black, and that is *all* I know about the situation, then
symmetry forces me to assign a probability of 1/2 to the proposition
"the card is red". That is because, if we were to exchange the labels
"black" and "red", my state of knowledge would be unchanged. Symmetry
arguments are based on this kind of reasoning.

Since there appear to be several loosely-related usages of the phrase
"maximum entropy" out there, let me make it clear that I am referring
to Jaynes's principle of maximum entropy. Suppose that the
information you have at hand can be expressed as the value of certain
expectations: you know that E[x[i]] = k[i], for 1 <= i <= n. Then
there are a variety of arguments for saying that the probability
distribution that represents exactly this information -- no more, and
no less -- is the distribution having the required expected values,
whose entropy is the greatest.

As an example, if you measure the volume, pressure, and temperature of
a gas, this tells you various average properties of the molecules in
the gas, but nothing about any particular molecule. Combining
symmetry considerations (you have no information that distinguishes
any one molecule from any other) with the principle of maximum entropy
then gives you a probability distribution over the ensemble of
molecules in the gas. As Jaynes writes, "Indeed, virtually all known
thermodynamic relations, found over more than a century by the most
diverse and difficult kinds of physical reasoning and experimentation,
are now seen as straightforward mathematical identities of the Maximum
Entropy formalism. This makes it clear that those relations are
actually independent of any particular physical assumptions and are
properties of extended logic in general, giving us a new insight into
why the relations of thermodynamics are so general, independent of the
properties of any particular substance."

3. In your paper you write that "conventional Bayesian analysis
doesn't distinguish t [the true state of things] from g [one's guess
at the truth]." Yet you never support or elaborate on this statement,
so I have to guess :-) at why you say this. I find this statement
puzzling, since in a full Bayesian analysis you *don't* make any guess
as to the truth. Instead, you derive a posterior distribution for t.
How you use this posterior distribution depends on your goals.
Presumably you intend to use it to guide your actions, in which case
decision theory tells us you choose the action whose expected utility
-- according to the posterior distribution for t -- is the greatest.

Perhaps you are confusing maximum a posteriori (MAP) methods with a full
Bayesian analysis. In a full Bayesian analysis, you compute the
probability of an outcome o as
P(o | I) =
(SUM t:: P(t | I) * P(o | t, I)).
MAP is an engineering approximation to the above formula in which you
assume that P(t | I) has a sharp peak, and approximate P(o | I) with
P(o | t_max, I). You might then consider t_max to be a guess at the
truth. Sometimes MAP is a reasonable approximation, and
sometimes it is not. One of the dangers of using MAP is that you will
tend to be overconfident in your predictions of the outcome, because
you have not incorporated your uncertainty about t into your prediction.
I happen to think that this particular approximation is often
carelessly used, without checking to see how sharply peaked the
posterior really is.

-------------------------------------------------------------------------
Kevin S. Van Horn
fonix Corporation