[UAI] "Higher-order" probabilities

From: Kevin S. Van Horn (Kevin_VanHorn@ndsu.nodak.edu)
Date: Tue Aug 21 2001 - 09:06:15 PDT

Next message: KDD 2001: "KDD-2001 Conference Less Than 2 Wks Away!"

Previous message: Ning Zhong: "[UAI] IEEE Data Mining 2001: Call for Participation"
Reply: Kevin S. Van Horn: "Re: [UAI] "Higher-order" probabilities"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, 2 Aug 2001, Kathryn Blackmond Laskey wrote:

> If probabilities apply only to exclusive and exhaustive propositions
> satisfying the clarity test, then to what exactly does a higher-order
> probability refer? Do we have to commit to the existence of definite
> "true" probabilities our hypothetical clairvoyant could "know," that
> we are "measuring" when we observe random variables?

Jaynes would call any discussion of "true" probabilities an example of the
mind-projection fallacy: the fallacy of treating constructs that merely
represent one's own state of information as if they were properties of
the external physical world. So the Jaynesian answer to your second question
is no.

For the first question, let's look at a specific, concrete example: a sequence
of coin flips from a coin with an unknown bias. Let x_i be 0 if the i-th coin
flip is tails, 1 if the i-th coin flip is heads, let theta be the (unknown)
bias parameter, and let I represent our state of information. Formally, for
any n > 0,

- - P(x[1..n] = X[1..n] | I) =
Integral(P(x[1..n] = X[1..n] | theta = Theta, I) * f_I(Theta), Theta = 0..1),
where f_I is a probability density dependent on I.

- - P(x[1..n] = X[1..n] | theta = Theta, I) =
PRODUCT(i : 1 <= i <= n: P(x[i] = X[i] | theta = Theta, I)).
That is, the x[i] are independent given theta = Theta and I.

- - P(x[i] = 1 | theta = Theta, I) = Theta.

Here theta is an example of what intuitively seems to be a "higher-order
probability", and the question is, "what the heck does that really mean?"

The easy way out is the approach of de Finetti that you described: we say that
our state of information I does not distinguish between permutations of
x[1..n] for any n, so that x[1..] is infinitely exchangeable (given I), hence
we know that *some* function f_I must exist satisfying the above equations.
In this view, theta is just an artifact of one possible way of representing
the joint distribution over x[1..n]; theta is *not* a probability, it is just
a parameter that happens to be numerically identical to a related probability.

You don't seem to find that answer terribly satisfying, and I'm not sure I do
either, so here's another way of resolving the question: theta is the
fraction of 1's in the sequence x[1..N], for large N, i.e.

theta = (SUM i: 1 <= i <= N: x[i]) / N.

Our prior distribution over theta then describes our uncertainty as to the the
fraction of x[i] values that are one.

To be more precise, suppose that we define

1. P(x[1..N] = X[1..N] | theta = k/N, I[N])
= 1 / C(N, k) if (SUM i: 1 <= i <= N: X[i]) = k
= 0 otherwise;

2. P(theta = k/N | I[N]) = (INTEGRAL t: k/N <= t <= (k + 1)/N: f_I(t)).

(1) just says that I[N] contains no relevant information once theta is known,
so by the principle of maximum entropy we assign equal probabilities to all
possibilities consistent with theta = k/N.

(2) makes P(theta = k/N | I[N]) and f_I consistent.

If f_I is sufficiently smooth then we find that P(x[1..n] = X[1..n] |
I[N]) - --> P(x[1..n] = X[1..n] | I) as N --> infinity, for fixed n.
(I think. To be honest, I haven't worked through all the details, and
it may be necessary to fiddle with the boundary probabilities.)

Next message: KDD 2001: "KDD-2001 Conference Less Than 2 Wks Away!"
Previous message: Ning Zhong: "[UAI] IEEE Data Mining 2001: Call for Participation"
Reply: Kevin S. Van Horn: "Re: [UAI] "Higher-order" probabilities"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Tue Aug 21 2001 - 09:15:03 PDT