[UAI] "Higher-order" probabilities

From: Kevin S. Van Horn (Kevin_VanHorn@ndsu.nodak.edu)
Date: Tue Aug 21 2001 - 09:06:15 PDT

  • Next message: KDD 2001: "KDD-2001 Conference Less Than 2 Wks Away!"

    On Thu, 2 Aug 2001, Kathryn Blackmond Laskey wrote:

    > If probabilities apply only to exclusive and exhaustive propositions
    > satisfying the clarity test, then to what exactly does a higher-order
    > probability refer? Do we have to commit to the existence of definite
    > "true" probabilities our hypothetical clairvoyant could "know," that
    > we are "measuring" when we observe random variables?

    Jaynes would call any discussion of "true" probabilities an example of the
    mind-projection fallacy: the fallacy of treating constructs that merely
    represent one's own state of information as if they were properties of
    the external physical world. So the Jaynesian answer to your second question
    is no.

    For the first question, let's look at a specific, concrete example: a sequence
    of coin flips from a coin with an unknown bias. Let x_i be 0 if the i-th coin
    flip is tails, 1 if the i-th coin flip is heads, let theta be the (unknown)
    bias parameter, and let I represent our state of information. Formally, for
    any n > 0,

    - - P(x[1..n] = X[1..n] | I) =
      Integral(P(x[1..n] = X[1..n] | theta = Theta, I) * f_I(Theta), Theta = 0..1),
      where f_I is a probability density dependent on I.

    - - P(x[1..n] = X[1..n] | theta = Theta, I) =
      PRODUCT(i : 1 <= i <= n: P(x[i] = X[i] | theta = Theta, I)).
      That is, the x[i] are independent given theta = Theta and I.

    - - P(x[i] = 1 | theta = Theta, I) = Theta.

    Here theta is an example of what intuitively seems to be a "higher-order
    probability", and the question is, "what the heck does that really mean?"

    The easy way out is the approach of de Finetti that you described: we say that
    our state of information I does not distinguish between permutations of
    x[1..n] for any n, so that x[1..] is infinitely exchangeable (given I), hence
    we know that *some* function f_I must exist satisfying the above equations.
    In this view, theta is just an artifact of one possible way of representing
    the joint distribution over x[1..n]; theta is *not* a probability, it is just
    a parameter that happens to be numerically identical to a related probability.

    You don't seem to find that answer terribly satisfying, and I'm not sure I do
    either, so here's another way of resolving the question: theta is the
    fraction of 1's in the sequence x[1..N], for large N, i.e.

      theta = (SUM i: 1 <= i <= N: x[i]) / N.

    Our prior distribution over theta then describes our uncertainty as to the the
    fraction of x[i] values that are one.

    To be more precise, suppose that we define

    1. P(x[1..N] = X[1..N] | theta = k/N, I[N])
       = 1 / C(N, k) if (SUM i: 1 <= i <= N: X[i]) = k
       = 0 otherwise;

    2. P(theta = k/N | I[N]) = (INTEGRAL t: k/N <= t <= (k + 1)/N: f_I(t)).

    (1) just says that I[N] contains no relevant information once theta is known,
    so by the principle of maximum entropy we assign equal probabilities to all
    possibilities consistent with theta = k/N.

    (2) makes P(theta = k/N | I[N]) and f_I consistent.

    If f_I is sufficiently smooth then we find that P(x[1..n] = X[1..n] |
    I[N]) - --> P(x[1..n] = X[1..n] | I) as N --> infinity, for fixed n.
    (I think. To be honest, I haven't worked through all the details, and
    it may be necessary to fiddle with the boundary probabilities.)



    This archive was generated by hypermail 2b29 : Tue Aug 21 2001 - 09:15:03 PDT