Re: Problem with the degree of belief interpretation

Kathryn Blackmond Laskey (klaskey@gmu.edu)
Fri, 31 Jul 1998 12:36:42 -0500

David,

I found your paper, Kevin's comments, and your response, enjoyable and
enlightening. Thank you for circulating the paper. A few of my own
comments.

1. On Page 1 you say: "As compelling as these reasons are, though, none
of them constitute a proof that Bayesian techniques perform better than
non-Bayesian techniques in the real world." Of course there is no proof!
As you well know, there are LOTS of theorems (including your own)
demonstrating, under differing kinds of assumptions, that you can't PROVE
an inductive procedure works. That's what makes it induction rather than
deduction. Intuitively it's easy to see why this is the case -- Reality
can always, if she chooses, introduce something totally unforeseen to muck
up our best models.

Actually, we should WELCOME this news! Most of the major advances in
science have come when our current models were "surprised" by something
totally unexpected (to us, if not to Reality). A good Bayesian learns to
"reserve a pinch of probability for the possibility that his model is
wrong" (a phrase due to Morris deGroot)

A poorly conceived Bayesian analysis with dogmatic priors can always be
outperformed by a superior Bayesian or non-Bayesian analysis. What does
"superior" mean? If you believe in objective probabilities, "superior"
means one with good frequency properties with respect to the "true
probabilities." If you are a subjectivist, "superior" means one with a
"nondogmatic" prior, i.e., one that can learn a sufficiently complex model
to capture the phenomenon of interest.

2. My "reconciliation" would have a different flavor than yours. It goes
something like this:

- Give me a learning procedure that works well across a broad variety of
situations, under some reasonable performance metric, and I'll bet I (or
maybe someone a little smarter than I am) can demonstrate that it's
approximately decision theoretically optimal, if you allow me to include
all relevant factors (including computational tractability) in the utility
function and if I have an appropriate prior distribution;

- Give me a learning procedure that works well across a broad variety of
situations, under some reasonable performance metric, and I'll bet I can
demonstrate that it has good frequency properties;

- One way to view procedures with good frequency properties is that they
define algorithms to be applied to problems in some broadly defined class,
where for reasons of practical implementation you want to pretend a priori
that you know nothing about the problem except that it falls into the class
of interest. Such procedures will be approximately decision theoretically
optimal when it is too costly (by whatever measure of cost you care to
apply) to hand-tailor a full-blown Bayesian analysis to an individual
problem.

Many people have made their careers proving that their favorite Bayesian
technique has good frequency properties. Others have made their careers by
proving that some popular frequentist method is approximately Bayesian.
This is a good thing to do because it helps us understand why these
procedures work and when they might break.

3. Currently, the name of the game for Bayesians is identifying "robust
classes of priors" for high-dimensional problems. What do I mean by that?
In very high-dimensional problems both traditional Bayesian and traditional
frequentist procedures break down because there is not "enough data" for
the number of parameters, and the kinds of prior assumptions traditionally
used in Bayesian models don't work well. The kind of prior information
that seems very useful to encode is something like: "there is probably a
lot of conditional independence" (i.e., the effective dimensionality of the
parameter space is much smaller than the apparent dimension as we've
parameterized the problem). It seems if you can encode this knowledge
without imposing stringent prior restrictions on which specific constraints
are "tight," then you get excellent learning performance. Thinking like a
Bayesian helps me, anyway, to get good insights into what I'm doing from a
theoretical perspective when I design these kinds of algorithms, so it
helps me to be a better engineer. If it doesn't help you, then fine.
Decision theory itself would tell you it's suboptimal for you to think like
a Bayesian if that mode of thinking doesn't work for you.

4. Whether there "really is" randomness or whether it's "only" a product
of incomplete information is a matter of religion not science.
Exchangeability is formally equivalent to iid given an unknown "true
probability." That means the strict subjectivist and the dyed-in-the-wool
frequentist are operating from observationally equivalent models. I love
debating philsophy, but let's be clear that we're talking metaphysics, in
that there is no observation that could distinguish the two viewpoints.
Let's also be clear that BOTH views are ONLY MODELS, and that Reality is
richer than ANY of our models.

5. You say that the identification of P(g|d) with P(t|d) by the Bayesian is
an "extra unjustified assumption." To the subjectivist, it's the
frequentist that makes an "extra unjustified assumption." To the
subjectivist, each observer has his/her own P_o(t|d), which can all be
different (you would probably prefer the notation P(g_o|d) to empahsize
that all the observers are guessing, but it's the same thing in different
notation). The assumption that there IS a "real, objective" P(t|d) that
all these subjectivists are approximating is an "extra unjustified
assumption" to the subjectivist. Also, note that the subjectivist is not
necessarily assuming a link between g and t. All she has is her guess and
Bayesian conditioning and the hope that the "messages" she gets from
Reality will provide her with enough information to learn to guess better
in the future. If all our subjectivists are using procedures with good
frequency properties with respect to a model that is complex enough to
approximate what Nature is "really doing" (whether complex and
deterministic or "really random" or something else -- i.e. "free will,"
whatever that "really means") then all the subjectivists will eventually
come to agree pretty closely with one another on observables. We can
characterize the kinds of problems any given learning procedure is capable
of learning. But keep in mind there is no reason to suppose subjectivists
and frequentists will ever come to agreement on observationally
indistinguishable metaphysics.

6. You say, "we can self-consistently imagine that the universe evolves in
accord with equations governing 'absolute, objective' probabilities..." I
perfectly agree. Your view of the world IS self-consistent. But that
doesn't make it right. The strict subjectivist's view of the world is also
self-consistent, and it's also not necessarily right. I personally am
agnostic with regard to whether there are "real" probabilities in Nature.
I find it useful to think that way sometimes. But maybe what look to us
like "real" probabilities are just an artifact of incomplete information
and sensitive dependence on initial conditions. Or maybe these
"probabilities" arise out of "real free will" in the universe. How would
you model "real free will?" I have this nagging feeling I just can't get
rid of that I HAVE it, even if I can't model it. Economists and game
theorists model it with probability, which seems to work pretty well when
the structure of the game the agents are playing drives them to predictable
"mixed" strategies. When this is not the case, NO ONE can do a "good" job
in the sense of a procedure with good frequency properties! Whether
economic agents are "really" deteministic, random or have free will doesn't
matter to the economist's model. There's the famous line of Ron Howard that
my behavior is free will to me and random to you.

7. Maybe when we die God will tell us "the truth" about the metaphysics we
are debating. If so, I bet "the truth" will surprise ALL of us! In the
meantime, it's fun to have these discussions, but don't expect to convince
anyone you are "right" about observationally indistinguishable metaphysical
assumptions. The game I like MUCH better is exploring the implications of
different metaphysics to see what they suggest about observables, to
understand how different metaphysics relate to each other, to see whether
what "makes sense" under one metaphysic can be translated to another
framework and perhaps become the key to unlock what was thought to be an
intractable puzzle, and the like.

Kathy Laskey