Re: [UAI] Definition of Bayesian network

From: Michael Jordan (jordan@cs.berkeley.edu)
Date: Mon Jul 30 2001 - 07:49:07 PDT

  • Next message: profrich@megsinet.net: "Re: [UAI] Definition of Bayesian network"

    > There are obvious cases that can't be represented by a belief network
    > (Bayesian network). These are when there are uncountably many variables
    > (a belief network assumes an enumeration of variables). For example,
    > think of my position at time T as a variable for each time T. It is not
    > unreasonable to model T as the reals (which are not enumerable). This
    > cannot be modelled as a belief network. Can it also not be modelled as a
    > joint? If not then we need some new concepts, as continuous time is
    > important to model.

    The general theory of "stochastic processes" treats collections of
    random variables indexed by the real line, other Euclidean spaces and
    (far, far) beyond. See, e.g., Sections 36 and 37 of Billingsley's
    "Probability and Measure" (my own favorite probability text) for
    a basic introduction. As you'll see, Kolmogorov, among his other
    achievements, established a set of consistency conditions under which
    a set of finite-dimensional distributions extend to a stochastic
    process. These conditions are widely used.

    Back on the general discussion, I think that it's important that Milan
    has pointed out some of the non-trivial issues that arise in making
    the notion of "conditional probability" rigorous. But even without
    getting into pathologies, one can see that some thought is required in
    order to handle continuous variables with any degree of honesty.

    In the discrete setting, one generally thinks of the conditional
    probability "P(A | B = b)" in terms of divvying up the probability
    mass assigned to the event {B = b}, among all of the possible states of A.
    I.e., we fix B = b and range over the probabilites P(A = a, B = b);
    these probabilities divvy up P(B = b).

    In the continuous setting, however, it's usually (but not always)
    the case that any given value of a random variable has zero probability
    mass. That's no problem in defining probabilities, roughly because we
    can integrate in a small region around any value. But when we condition,
    we're conditioning on a specific value, and therefore we end up trying
    to divvy up zero. It's difficult to divvy up zero in a meaningful way.

    Section 33 of Billingsley has an extremely lucid discussion of this issue,
    which I highly recommend to anyone who hasn't worked through these issues
    before. He provides a general, satisfying definition of conditional
    probability. I won't spoil your fun, but a basic insight is that instead
    of treating P(A | B) as a function of A for fixed B, one treats P(A | B)
    as a function of B for fixed A.

    As a general remark on some of the discussions on probability theory
    that recur on the UAI list, I think that it's important to emphasize
    that probability theory is best viewed as a special case of measure
    theory, and it's not a conceit of the mathematicians that they settled
    on the machinery of measurable spaces, random-variables-as-functions
    and the like. In case you don't believe this, read Section 1 of
    Billingsley, which will convince you that without measure theory even
    some elementary results regarding coin-tossing are out of reach.

    Mike



    This archive was generated by hypermail 2b29 : Mon Jul 30 2001 - 07:49:32 PDT