Re: [UAI] "Higher-order" probabilities

From: Kevin S. Van Horn (Kevin_VanHorn@ndsu.nodak.edu)
Date: Thu Aug 23 2001 - 10:43:01 PDT

  • Next message: Ian Miguel: "CP 2001 Accepted Papers"

    On Tue, 21 Aug 2001, Thomas Richardson wrote:

    > > The easy way out is the approach of de Finetti that you described: we say
    > > that our state of information I does not distinguish between permutations
    > > of x[1..n] for any n, so that x[1..] is infinitely exchangeable (given I),
    > > hence we know that *some* function f_I must exist satisfying the above
    > > equations. In this view, theta is just an artifact of one possible way of
    > > representing the joint distribution over x[1..n]; theta is *not* a
    > > probability, it is just a parameter that happens to be numerically
    > > identical to a related probability.
    >
    > I'm not sure I really see the distinction that you are making here - at
    > least not without suspecting that you are making the mind-projection
    > fallacy :-)

    Suppose that you have no information about the sequence x[1..] that would
    differentiate x[1..n] from any permutation of x[1..n], for any n. In other
    words, your state of information assigns the same probability to
    x[1..n]=X[1..n] and x[1..n]=Y[1..n] whenever Y[1..n] is a permutation of
    X[1..n], for all n. At this point we have not introduced any variable theta,
    nor any distribution over theta, nor any probabilities conditioned on theta.
    De Finetti proved that any probability distribution satisfying the above
    constraint can be written in a form that is mathematically identical to what
    we would obtain by first giving a probability distribution over theta, then
    defining x[i] and x[j] (i != j) to be independent given theta, with
    P(x[i]=1 | theta) = theta for all i. But none of this requires us to assign
    any physical meaning to theta.

    > > If f_I is sufficiently smooth then we find that P(x[1..n] = X[1..n] |
    > > I[N]) - --> P(x[1..n] = X[1..n] | I) as N --> infinity, for fixed n.
    > > (I think. To be honest, I haven't worked through all the details, and
    > > it may be necessary to fiddle with the boundary probabilities.)
    >
    > What is I[N] here ?

    I[N] is a state of information in which we have some notion of what the number
    of 1's in x[1..N] might be --- as indicated by the probability distribution I
    gave over this quantity, chosen to be asymptotically consistent with f_I ---
    and in which this is ALL the information we have that is relevant to the
    problem. The point is that the original state of information I that we were
    discussing -- which gave us an apparent probability distribution over a
    probability -- is, for purposes of inference over x[1..n], where n << N, a
    close approximation to I[N].

    I am using the Jaynesian notion of "state of information" here, which he used
    to mean everything we know that is relevant to the truth or falsity of the
    propositions of interest to us. Jaynes always used conditional probabilities
    P(A | X), with the right-hand-side X being a state of information instead of a
    predicate. (We can, of course, further condition on a predicate B, writing
    P(A | B, X), where "B, X" is the state of information we obtain by adding to X
    the knowledge that B is true.) Jaynes was a Bayesian, but not a subjectivist;
    his work on maximum entropy was aimed at providing non-subjective Bayesian
    priors that can be said to precisely encode certain kinds of information.

    You may feel uncomfortable with not having a precise, mathematical definition
    of what a "state of information is". From the logical viewpoint, it doesn't
    matter that much; once we've given the axioms of probability theory, we can
    manipulate expressions involving states of information without worrying about
    their internal structure, any more than we worry about the internal structure
    of integers. When it comes to proving the consistency of the axioms and rules
    of probability theory, then we must come up with some model for what a state
    of information is; one choice that works is to define a state of information
    as a set-theoretical probability distribution over an appropriate space. But
    this is only one possible model, and we need not concern ourselves with this
    interpretation when carrying out inferences.

    > I don't really understand your point - isn't this just saying that we can
    > act as if theta was a probability and noone will be any the wiser?

    I'm saying that, although the mathematical form of P(x[1..n]=X[1..n] | I)
    appears to involve higher-order probabilities, there is in fact no need to
    introduce the conceptual baggage of higher-order probabilities and the
    confusion that results from thinking of probabilities as physical
    quantitities. We can instead interpret theta as a particular concrete,
    directly-observable quantity, which turns our apparent higher-order
    probabilities back into first-order probabilities.



    This archive was generated by hypermail 2b29 : Thu Aug 23 2001 - 10:47:23 PDT