On Mon, 30 Jul 2001, Michael Jordan wrote:
> Back on the general discussion, I think that it's important that Milan
> has pointed out some of the non-trivial issues that arise in making
> the notion of "conditional probability" rigorous. But even without
> getting into pathologies, one can see that some thought is required in
> order to handle continuous variables with any degree of honesty.
>From my reading of his work, I believe Jaynes would have said that a
conditional probability P(A | B = b, X), where B is a continuous variable,
is not well-defined until you also specify the limiting process. That is,
we take P(A | B in nbr(b, eps), X) as fundamental, where nbr(b, eps) is
some neighborhood of b that shrinks to a point as eps -> 0, and simply
use P(A | B = b, X) as a shorthand for
lim_{eps -> 0} P(A | B in nbr(b, eps), X).
Thus, the "correct" definition of P(A | B = b, X) is problem-dependent,
depending on what limiting process is appropriate for the problem at hand.
The standard example is to take a uniform distribution over a sphere
and consider the conditional distribution when we restrict ourselves to a
particular great circle. If the great circle goes about the equator, it
seems obvious that the conditional distribution is uniform. On the other
hand, if the great circle goes through the poles, controversy arises.
The reason for this controversy is that there are TWO
obvious limiting processes to define the conditional probability, and
these give very different answers. One limiting process takes the
neighborhood of a point on the sphere to be all points latitude lying
within a distance of eps. This gives a conditioning region that is a band
centered on the great circle, and we take the limit as the width of this
band goes to zero. The second limiting process takes the neighborhood of
a point on the sphere to be all points with the same latitude and a
longitude that differs by at most eps. This gives a conditioning region
that is a pair of "orange slices" connected at the tips (at the poles).
The first limiting process gives a uniform conditional distribution,
whereas the second does not (the probability density goes to zero at the
poles.)
Jaynes has a discussion of this issue in the (unfinished) book previously
mentioned (http://bayes.wustl.edu).
> As a general remark on some of the discussions on probability theory
> that recur on the UAI list, I think that it's important to emphasize
> that probability theory is best viewed as a special case of measure
> theory,
Let me present another view, again based on Jaynes's ideas. The title of
his book is "Probability Theory: The Logic of Science." Jaynes viewed
probability theory primarily as a logic of plausible inference. So let's
take a look at this from the perspective of mathematical logic. (This is
my own elaboration of the Jaynesian view.) The product and sum rules of
probability theory give us the proof theory of our logic.
Set-theoretic probability theory gives us the model theory for our logic.
That is, it allows us to construct sets of axioms (e.g., a set of
conditional probabilities defining a joint probability distribution over
the variables of interest) that are consistent, so that we may avoid
reasoning from inconsistent premises.
This distinction, I believe, cleans up the conceptual landscape quite a
bit. For example, there was some discussion on this list recently about
the definition of a random variable, and the fact that a random variable's
definition changes if we enlarge the sample space. The Jaynesian
viewpoint is that there are no "random" variables -- there are only
variables whose values may not be known with certainty, and there is no
logical distinction between these and any other variable. Only at the
model theory level, when we concern ourselves with proving consistency,
do we have to define the notion of a random variable, sample space, etc.
Thus, measure theory helps us build consistent probabilistic models
involving continuous variables, but once these are defined, we may ignore
its subtleties and crank through the simple logical rules of probability
theory to carry out our inferences (assuming that we follow Jaynes's
policy with regard to infinite sets.)
This archive was generated by hypermail 2b29 : Mon Jul 30 2001 - 16:57:02 PDT