Rich:
I find that an intuitive explanation of a Bayesian network is often
needlessly difficult to explain when approached from local primitives. The
reason being that the BN is not unique to a given joint distribution. Given
that the joint exists, there is a Bayesian network for every possible
ordering of the variables. The chain rule promises the product of
conditionals given the antecedents in the ordering equals the joint. It is
only a matter of refining the parents of xj to be the smallest set in x1 ...
xj-1 needed to make xj independent of the other antecedents.
Now form the entire set of conditionals you can define in this manner and
randomly select one for each variable x of the form P{x | p(x)}. Clearly you
can't expect that random selection to conform to a DAG or the Markov
condition. Nor should you expect the knowledge engineer to blithely go out
into the word to discover conditional relationships and expect to find a
Bayesian network (even though they are all over the place).
The point is that if the reader of your book sees the definition, then he
may be lead to believe that that gives him a guide as to how to begin
searching for a Bayesian network. But starting with a set of primitive
local relationships, such as conditional independence or conditional
probability does not necessarily result in an discovery of the global DAG
or Markov property.
A guiding principle like causality may help but is no guarantee. My
non-believer friends immediately want to interpret the arcs in a Bayes net
as causal arcs and immediately argue that the counter to a DAG is feedback.
They then dismiss the Bayes net as a construct even though existence of a
Bayes net is as tautological as existence of a joint probability
distribution which they are willing to accept, even when there is feedback.
So why not start with a definition of a Bayesian network as a multiplicative
decomposition of a joint distribution guaranteed by the chain rule. Then
prove the iff theorems that the Markov property is satisfied and the
implicit network is a DAG. Its not that hard to believe that joints exist
for almost any set of variables -- and so Bayesian networks are always
around. And now we see that the DAGs must always be there as well as the
Markov properties, even though they seem like harder conditions to satisfy.
I too like the idea that we would like to build systems from primitive local
concepts. But the power of the Bayesian network approach over rule based
expert systems is to recognize that there are global constraints these
primitives must satisfy in order use them consistently. As an abstract
concept, I for one find the additivity axiom for existence of a joint
easier to swallow than a DAG or Markov property.
Bob Welch
- ----- Original Message -----
From: <profrich@megsinet.net>
To: <uai@cs.orst.edu>
Sent: Wednesday, July 18, 2001 1:29 PM
Subject: [UAI] Definition of Bayesian network
> Dear Colleagues,
>
> In my 1990 book I defined a Bayesian network approximately as follows:
>
> Definition of Markov Condition: Suppose we have a joint probability
> distribution P of the random variables in some set V and a DAG G=(V,E). We
> say that (G,P) satisfies the Markov condition if for each variable X in V,
> {X} is conditionally independent of the set of all its nondescendants
given
> the set of all its parents.
>
> Definition of Bayesian Network: Let P be a joint probability distribution
> of the random variables in some set V, and G=(V,E) be a DAG. We call (G,P)
> a Bayesian network if (G,P)satisfies the Markov condition.
>
> The fact that the joint is the product of the conditionals is then an iff
> theorem.
>
> I used the same definition in my current book. However, a reviewer
> commented that this was nonstandard and unintuitive. The reviewer
suggested
> I define it as a DAG along with specified conditional distributions (which
> I realize is more often done). My definition would then be an iff theorem.
>
> My reason for defining it the way I did is that I feel `causal' networks
> exist in nature without anyone specifying conditional probability
> distributions. We identify them by noting that the conditional
> independencies exist, not by seeing if the joint is the product of the
> conditionals. So to me the conditional independencies are the more basic
> concept.
>
> However, a researcher, with whom I discussed this, noted that telling a
> person what numbers you plan to store at each node is not provable from my
> definition, yet it should be part of the definition as Bayes Nets are not
> only statistical objects, they are computational objects.
>
> I am left undecided about which definition seems more appropriate. I would
> appreciate comments from the general community.
>
> Sincerely,
>
> Rich Neapolitan
>
This archive was generated by hypermail 2b29 : Sat Jul 21 2001 - 11:18:50 PDT