Re: [UAI] Definition of Bayesian network

From: Bob Welch (indianpeaks@home.com)
Date: Fri Jul 27 2001 - 07:47:22 PDT

  • Next message: Milan Studeny: "(no subject)"

    Rich, David

    Sorry to bother you with one more point of view. If I had to choose among
    the 2 definitions you gave I would select your original. However, in reply
    to your last note:

    > I think your feelings about `not having a belief network if you only have
    > the joint' lies at heart of individual differences on this matter. It
    seems
    > to the mathematician you most certainly do, and, in fact, that is part of
    > her problem with defining a Bayesian network in terms of conditionals.
    > However, to the `engineer' you don't have a Bayesian network unless it can
    > be constructed practically.

    Indeed, the mathematician's viewpoint is perhaps best exemplified by an
    analogy to the existence of a solution of partial differential equations. Or
    more fundamentally, the recovery of a function from its derivatives. Another
    instance of the problem is found in Economics in the theory of revealed
    preference. Is there a preference relation that could have generated a given
    demand function? When you consider that the conditional probability (in the
    measure theoretic development of probability) is the Radon-Nikodym
    derivative of the joint probability with respect to a marginal, the analogy
    is almost exact. In all cases, the desire is to recover global properties
    using local information. And in each of these cases, there is a condition,
    often called an integrability condition, that allows one to splice together
    the local properties in deriving the global relationship. In Bayesian
    networks, we develop algorithms that allow us to derive properties of the
    global function (the joint distribution) based on the local properties (the
    conditionals) knowing that the latter are consistent with the existence of
    the former. We can do this because the integrability condition (that the
    implicit directed graph defined by the conditionals is a DAG) assures the
    consistency with some joint (at least for discrete networks)*.

    Why should this be of interest to the engineer as well as the mathematician?
    This is because the real world variables are components of a real system
    whose properties the engineer wants to know. The additivity axiom is
    fundamental here because it defines the scope of the system (we insist that
    if there are some leakages from our system that we account for that
    possibility or else insist that its not important). Once we have defined the
    scope, then we have to accept the consistency imposed by the system.

    My experience with engineers -- primarily chemical engineers who deal with
    control problems and have been out of school for some time -- is that they
    have trouble not with the concept of a system imposed relationship among
    variables (the joint distribution) but with a "network of causal
    relationships that assumes away feedback." And if that is the paradigm by
    which I try to introduce the Bayesian network, then they quickly become
    disinterested in a causal modeling tool that can't handle feedback. While I
    try to wave my hands about how feedback is a phenomena of too course a time
    granularity or that the Bayesian network has extensions that allow nodules
    of undirected loops, their eyes glaze over. Yet there is nothing inherent
    in the Bayesian network itself that won't represent systems with feedback.
    But by this time I have lost the argument.

    My experience has been that the sale is more successful if you first explain
    the concept of a "system of variables" and the joint distribution describes
    that system. Then define the Bayesian network FUNCTIONALLY as something
    that engineers do all the time: it is a collection of pde's and algorithms
    whose function is to derive properties of the system using local pieces of
    information (which must satisfy an integrability or system consistency
    condition for the solution algorithms to produce something meaningful). And
    then give them the tools needed to build the model out of the equivalent of
    the pde, the conditional probability tables and the network DAG.

    I find the chain rule to be the easiest guide to constructing a Bayesian
    network and it subsumes the causal model without feedback, since in the
    latter the variables have a natural order.

    Now a discussion of arc reversals are in order. Having made an arc
    reversal, how does one interpret the meaning of an arc? Is it causality, is
    it dependence? It may be difficult to build the Bayes net that represents
    a causal model with feedback. So why not train it from data. At this point
    you are out of the trap that a Bayesian network is synonymous with a causal
    network that has no feedback.

    Yet I like a causal model. When there is no feedback, I believe it gives the
    network easiest to understand and the simplest Bayesian network -- one with
    a minimal number of arcs, though I have yet to see a proof.

    Well everybody has their experience trying to convert engineers into
    Bayesians. This approach is a result of mine. It tries to use a language
    that engineers are very comfortable with. All engineers must learn about
    differential equations. Not that many learn about probability. I must
    admit, its not always successful.

    Bob.

    P.S. *In the measure theory problem, the non existence of a joint is an
    infinite dimensional space paradox. A set of conditionals (R-N derivatives)
    may not have a measure that divides into discrete and absolutely continuous
    components even though the conditionals do. What happens is that the only
    measure that is consistent with the conditionals ends up piling positive
    mass on a badly chopped up sets that have Lebesque measure zero. At least
    that is the best I can recall from the days when I actually worried (and
    wrote) about such things. David, I recall I did have some examples in that
    paper. I can dig up a reference if you like.

    - ----- Original Message -----
    From: "Rich Neapolitan" <profrich@megsinet.net>
    To: <uai@cs.orst.edu>
    Sent: Thursday, July 26, 2001 9:57 AM
    Subject: Re: [UAI] Definition of Bayesian network

    > Dear David,
    > I think your feelings about `not having a belief network if you only have
    > the joint' lies at heart of individual differences on this matter. It
    seems
    > to the mathematician you most certainly do, and, in fact, that is part of
    > her problem with defining a Bayesian network in terms of conditionals.
    > However, to the `engineer' you don't have a Bayesian network unless it can
    > be constructed practically. I don't think there is any right or wrong
    here.
    > It is a matter of perspective. After all, even mathematicians sometimes
    > disagree on issues like this. E.g. some do not accept the axiom of choice.
    >
    > I have received a lot of responses to my original query, but I do not know
    > how many went only to me. So I will report the result. It seems
    individuals
    > are divided about equally on this issue, with many being fairly certain
    > only one of the definitions is acceptable. It reminds me of what someone
    (I
    > think Ross Shachter) told me years ago, when I first became interested in
    > this field: "There is nothing researchers are more certain about than
    > uncertainty."
    >
    > Sincerely,
    >
    > Rich
    >
    > p.s. In answer to your last recommendation, I agree. Indeed, I've been
    > teaching this material for the past 12 years, and I found students seem to
    > respond best when I define a Bayes net as a DAG plus a P that satisfies
    the
    > Markov condition with the DAG. Then I give an urn example of a P which
    does
    > do this with a DAG. Next I show P is the product of its conditional
    > distributions. Then I give the iff theorem. After this, I give the
    > practical applications. I state causal DAGs seems to exist in nature and
    > the actual relative frequency distribution of the variables seems to
    > satisfy the Markov condition with these DAGs. We can sometimes identify
    the
    > independencies and build the causal DAG, and we can estimate the
    > conditional distributions in the DAG. Although the resultant probability
    > distribution is only an estimate of the actual relative frequency
    > distribution that is `out there', due to the theorem, it still satisfies
    > the Markov condition with the DAG. So we end up with a Bayesian network
    > that represents our beliefs about the actual causal network in nature.
    >
    > Maybe it is due to their training in classical statistics course, but, if
    I
    > take the other approach and just give them a DAG and conditionals to begin
    > with, they seem to think there is something hokey about the resultant
    > distribution, like it is just made up, and who cares if we have this
    > mathematical result that it satisfies the Markov condition with the DAG.
    >
    >
    > At 09:23 AM 7/25/01 -0700, David Poole wrote:
    > >
    > >There is a fundamental difference here.
    > >
    > >Suppose you have two variables, say A and B for which you have the joint
    > >distribution P(A,B). I would claim that you don't have a belief network
    > >(Bayes net) unless you have P(A) and P(B|A) or P(B) and P(A|B).
    > >
    > >Why should we care? If A and B are such that we can have the cumulative
    > >probability distribution (e.g., thet are on subsets of the reals) then
    > >we can easily do stochastic simulation (e.g., logic sampling) if we have
    > >a belief network as above. (We can generate a random sample by
    > >generating two random numbers in [0,1]). If we just have P(A,B) then is
    > >difficult to do stochastic simulation and we have to revert to methods
    > >such as MCMC. Such distributions where we can't easily specify them as
    > >a belief network arise very naturally when we are learning the structure
    > >of belief networks.
    > >
    > >Milan Studeny wrote:
    > >> My arguments in favour of the first definition are as follows.
    > >> The second approach does not seem suitable if one tries to go
    > >> behind discrete framework and to consider continuous Gaussian random
    > >> variables as some statisticians do. In fact, if one considers
    completely
    > >> general probablistic framework (random variables taking values in
    > >> general measurable spaces) then the concept of conditional probability
    > >> is quite complicated concept (from technical point of view) and the
    > >> described approaches are not equivalent in sense that a collection of
    > >> conditionals may exist for which no joint probability measure exists!
    > >
    > >Milan, Do you have an example or a reference for such a collection of
    > >conditionals?
    > >
    > >So Rich, to answer your question, you should think of whether all joint
    > >distributions are belief networks or just those ones for which you have
    > >the conditionals. As Milan says, sometimes the conditionals are hard to
    > >come by (in practice as well as in theory).
    > >
    > >But I think we are exactly the wrong people to ask about what is a
    > >natural definition of a belief network! You should be asking the people
    > >who don't know about them whether your definition makes sense or helps
    > >them understand the concept. (Of course you don't want to define
    > >something that doesn't coincide with the normal definitions).
    > >
    > >Good luck with your book.
    > >
    > >David
    > >
    > >- --
    > >David Poole, Office: +1 (604) 822-6254
    > >Department of Computer Science, poole@cs.ubc.ca
    > >University of British Columbia, http://www.cs.ubc.ca/spider/poole
    > >
    > >
    > >
    >
    >
    > Rich Neapolitan
    > Computer Science Department
    > Northeastern Illinois University
    > 5500 N. St. Louis
    > Chicago, Il 60625
    >
    >



    This archive was generated by hypermail 2b29 : Fri Jul 27 2001 - 07:49:40 PDT