On Tue, 21 Aug 2001, Thomas Richardson wrote:
> > The easy way out is the approach of de Finetti that you described: we say
> > that our state of information I does not distinguish between permutations
> > of x[1..n] for any n, so that x[1..] is infinitely exchangeable (given I),
> > hence we know that *some* function f_I must exist satisfying the above
> > equations. In this view, theta is just an artifact of one possible way of
> > representing the joint distribution over x[1..n]; theta is *not* a
> > probability, it is just a parameter that happens to be numerically
> > identical to a related probability.
>
> I'm not sure I really see the distinction that you are making here - at
> least not without suspecting that you are making the mind-projection
> fallacy :-)
Suppose that you have no information about the sequence x[1..] that would
differentiate x[1..n] from any permutation of x[1..n], for any n. In other
words, your state of information assigns the same probability to
x[1..n]=X[1..n] and x[1..n]=Y[1..n] whenever Y[1..n] is a permutation of
X[1..n], for all n. At this point we have not introduced any variable theta,
nor any distribution over theta, nor any probabilities conditioned on theta.
De Finetti proved that any probability distribution satisfying the above
constraint can be written in a form that is mathematically identical to what
we would obtain by first giving a probability distribution over theta, then
defining x[i] and x[j] (i != j) to be independent given theta, with
P(x[i]=1 | theta) = theta for all i. But none of this requires us to assign
any physical meaning to theta.
> > If f_I is sufficiently smooth then we find that P(x[1..n] = X[1..n] |
> > I[N]) - --> P(x[1..n] = X[1..n] | I) as N --> infinity, for fixed n.
> > (I think. To be honest, I haven't worked through all the details, and
> > it may be necessary to fiddle with the boundary probabilities.)
>
> What is I[N] here ?
I[N] is a state of information in which we have some notion of what the number
of 1's in x[1..N] might be --- as indicated by the probability distribution I
gave over this quantity, chosen to be asymptotically consistent with f_I ---
and in which this is ALL the information we have that is relevant to the
problem. The point is that the original state of information I that we were
discussing -- which gave us an apparent probability distribution over a
probability -- is, for purposes of inference over x[1..n], where n << N, a
close approximation to I[N].
I am using the Jaynesian notion of "state of information" here, which he used
to mean everything we know that is relevant to the truth or falsity of the
propositions of interest to us. Jaynes always used conditional probabilities
P(A | X), with the right-hand-side X being a state of information instead of a
predicate. (We can, of course, further condition on a predicate B, writing
P(A | B, X), where "B, X" is the state of information we obtain by adding to X
the knowledge that B is true.) Jaynes was a Bayesian, but not a subjectivist;
his work on maximum entropy was aimed at providing non-subjective Bayesian
priors that can be said to precisely encode certain kinds of information.
You may feel uncomfortable with not having a precise, mathematical definition
of what a "state of information is". From the logical viewpoint, it doesn't
matter that much; once we've given the axioms of probability theory, we can
manipulate expressions involving states of information without worrying about
their internal structure, any more than we worry about the internal structure
of integers. When it comes to proving the consistency of the axioms and rules
of probability theory, then we must come up with some model for what a state
of information is; one choice that works is to define a state of information
as a set-theoretical probability distribution over an appropriate space. But
this is only one possible model, and we need not concern ourselves with this
interpretation when carrying out inferences.
> I don't really understand your point - isn't this just saying that we can
> act as if theta was a probability and noone will be any the wiser?
I'm saying that, although the mathematical form of P(x[1..n]=X[1..n] | I)
appears to involve higher-order probabilities, there is in fact no need to
introduce the conceptual baggage of higher-order probabilities and the
confusion that results from thinking of probabilities as physical
quantitities. We can instead interpret theta as a particular concrete,
directly-observable quantity, which turns our apparent higher-order
probabilities back into first-order probabilities.
This archive was generated by hypermail 2b29 : Thu Aug 23 2001 - 10:47:23 PDT