DCG vs DAG Compil

canarelli (patrick.canarelli@filnet.fr)
Mon, 22 Jun 1998 13:10:01 +0200

Hello,
I thank the people who have answered my previous query about directed cyclic
graphs (DCG) in probabilistic networks.
For those interested I am forwarding the answers I have received(see below).
Patrick.

My question was:
________________________________________________________________________________
Hello,

I would be glad if someone could give me any pointers or advice relatively
to the following:

I am particularly interested in the field of learning bayesian networks from
data. One central assumption in bayesian networks (BNs) lies in the DAG
(Directed acyclic graph) assumption. Loops are thus forbiden. This allows in
particular to decompose the joint probability and thus, when e.g. inducing
bayesian network from data allows for a more efficient search.
Yet, this is a quite restrictive assumption given real world issues. For
instance, in system dynamics one important aspect is the concept of feedback
loops which exactly correpond to local Directed cyclic graphs (DCG). We then
have the following pattern: A->B, B->C, C->A, which is forbiden in BNs.
So, my 2 questions are:

1/is the DAG assumption in BNs absolutely needed or can it be included even
at a high computational cost? While the DAG assumptions is underlined in all
papers I have read on bayesian networks, I never found any clear
justification for it (or have missed it).

2/has any work already been done in the field of learning such probabilistic
networks from data allowing for DCG (either using bayesian networks or any
other technique)? In this case can anyone give me some pointers to references?

I thank you in advance for any information on the subject.

Patrick.
_____________________________________________________________________________

>>> I received quite a lot of reply which shows the interest (and debate) on
the subject:

____________________________________________________________________________
Date: Fri, 12 Jun 1998 18:40:09 +0200 (MET DST)
From: Peter Lucas <lucas@cs.uu.nl>
X-Sender: lucas@chios
To: canarelli <patrick.canarelli@filnet.fr>
Subject: Re: DCG vs DAG in bayesian networks
X-Org: Department of Computer Science; Utrecht University
X-Org: P.O. Box 80.089; 3508 TB Utrecht; The Netherlands.
X-Org: phone: +31-30-2531454; telefax: +31-30-2513791

On Fri, 12 Jun 1998, canarelli wrote:

> I am particularly interested in the field of learning bayesian networks from
> data. One central assumption in bayesian networks (BNs) lies in the DAG
> (Directed acyclic graph) assumption. Loops are thus forbiden. This allows in
> particular to decompose the joint probability and thus, when e.g. inducing

This exactly the reason why the restriction to DAGs is required:
the factorisation of the joint probability distribution
P(X1,...,Xn) must stop somewhere, since:

P(X1,...,Xn) = P(X1 | X2,...,Xn) * P(X2,...,Xn)

and P(X2,...,Xn) can again (recursively) be written in terms of a product of
a conditional probability and P(X3,...,Xn), finally ending up
with the prior probability P(Xn). Now, you can do this in any particular
order, but whichever order you choose, you will end-up with a DAG, which
has the following structure (depending on the choice of X1,...,Xn):

P(X1 | X2,...,Xn)
P(X2 | X3,...,Xn)
P(X3 | X4,...,Xn)
.
.
.
P(Xn)

Of cause, usually (or hopefully) the DAG is sparse due to taking (conditional)
independencies into account, which may result in simplifications
like: P(X1 | X2,...,Xn) = P(X1 | X2). In this case, instead of
drawing an arc from X2,...,Xn to X1, you just draw an arc from
X2 to X1, because X1 is conditionally independent of X3,...,Xn
given X2.

I hope this helps.

Peter

--
Peter Lucas
Dept. of Computer Science, Utrecht University
Padualaan 14, 3584 CH Utrecht, The Netherlands
Tel: + 31 30 2534094; E-mail: lucas@cs.uu.nl

______________________________________________________________________________ From: alexvk@vostok.engr.sgi.com (Alexander V. Kozlov) Subject: Re: DCG vs DAG in bayesian networks To: patrick.canarelli@filnet.fr (canarelli) Date: Fri, 12 Jun 1998 13:33:06 -0700 (PDT) WWW-page: http://www.stanford.edu/~alexvk

canarelli wrote: > > 1/is the DAG assumption in BNs absolutely needed or can it be included even > at a high computational cost? While the DAG assumptions is underlined in all > papers I have read on bayesian networks, I never found any clear > justification for it (or have missed it). >

DAG assumption is not necessary, but helps to automatically get normalization that the sum of all probabilities is one. A more general case of a probability distribution representation is a Markov network (or a Gibbs field). In the latter case every maximal clique in the graph represents a factor f(x_1, x_2, ..., x_n) in the joint probability decomposition. There can be loops in this case.

> 2/has any work already been done in the field of learning such probabilistic > networks from data allowing for DCG (either using bayesian networks or any > other technique)? In this case can anyone give me some pointers to references? >

I do not know of any references off hand, but there has been work on Markov field, particularly in statistics. I think AI people prefer BNs since they associate the direction of edges with causal influence (which is still a debatable question in statistics).

As far as I know, probabilistic inference in Markov networks is not more expensive than in BNs in general. There are some special cases, however. For example, barren node reduction (as defined by Shachter) can be done only in BNs. On the other hand, Ising model in physics represents a Markov network, and exact probabilistic inference is impossible there except in special cases. The standard AI probabilistic inference algorithm would not work there.

-- 
Alexander V. Kozlov | alexvk@engr.sgi.com | (650) 933-8493

_____________________________________________________________________________ Date: Mon, 15 Jun 1998 09:01:55 +0200 From: "Uffe Kjærulf" <uk@cs.auc.dk> Organization: Department of Computer Science, Aalborg University To: canarelli <patrick.canarelli@filnet.fr> Subject: Re: DCG vs DAG in bayesian networks References: <B0000524966@mail.filnet.fr> X-MIME-Autoconverted: from 8bit to quoted-printable by mailhost.cs.auc.dk id JAA07768

Dear Patrick,

My usual argument for not permitting DCGs is the following.

A Bayesian network is given by a pair (G=(V,E),p), where G is a graph representing the independence properties of the joint probability distribution, p, over the set of variables indexed by the set of nodes V={A,B,...,D}. Using the chain rule for probabilities, p can be decomposed into a product of the form p = p(A|B,...,D)p(B|C,...,D)...p(D). This factorization can be illustrated by the DAG with A having all the other variables as parents, B having all the other variables except A as parents, etc. This graph is a valid independence graph for p, although not very interesting, as it doesn't represent any of the independence properties of p. The independence properties of p results in simplification of the terms, like e.g. p(A|B,...,D) = p(A|B). Note that the DAG property is invariant under such simplifications. Thus the independence graph of a Bayesian network will inevitably be a DAG (or a chain graph, which is a generalization of a DAG).

In other words, the DAG structure is an inherent property of fundamental probability theory.

Regards, Uffe Kjærulff Dept. of Computer Science Aalborg University Denmark

____________________________________________________________________________ X-Lotus-FromDomain: EVEREST From: h.van.leijen@everest.nl To: patrick.canarelli@filnet.fr Date: Mon, 15 Jun 1998 09:33:05 +0200 Subject: Re: DCG vs DAG in bayesian networks

Patrick,

an excellent discussion of the problem can be found in a paper by Peter Spirtes from CMU, available via Internet: "Directed Cyclic Graphical Representation of Feedback Models", on http://hss.cmu.edu/HTML/departments/philosophy/TETRAD/tetrad.html.

Basically, the probability distribution of a belief network is defined by multiplying individual probabilities of each node given a configuration. The justification for this lies in the chain rule. However, the chain rule demands that it is possible to enumerate the nodes in such a way that a child always comes after its parents in the enumeration, which implies acyclicity. In a text book like Neapolitan's, it is proven that this definition of a probability distribution satisfies the independence relation induced by the graph, but only if the graph is acyclic. Spirtes' paper generalizes this a little bit, but not for discrete nodes, only for continuous ones.

If you find more references, could you let me know?

Sincerely, Hans van Leijen.

____________________________________________________________________________ Date: Mon, 15 Jun 1998 09:25:26 +0200 From: "Prof. Frank Lad" <flad@dipmat.unipg.it> To: patrick.canarelli@filnet.fr Subject: from Frank Lad, re dcg etc

Hello Patrick, I have just seen you querry regarding cyclic graphs. I think your concerns are exactly correct. I have written a paper that i can send you soon: A Subjectivist Assessment of Bayesian Networks: a challenge to the principles and the practice. It will be published this year in the journal "Soft Computing" in a special issue being prepared. One section contains specific remarks about the subject of cyclic graphs and their treatment. More generally, the result you would like to know about is called "The fundamental theorem of prevision" generated in the tradition of the subjectivist work of Bruno de Finetti. YOu can read about it and many other things in my book "Operational Subjective Statistical Methods: a mathematical, philosophical, and historical introduction", 1996, New York: JOhn Wiley. It is available in one pbookstore in paris, I know of a store that stocks wiley books. also it should be in the univ library. i must go now. please write me here if you are interested. bgest wishes, frank Lad

_____________________________________________________________________________

Date: Mon, 15 Jun 1998 11:06:08 +0200 From: Hermann von Hasseln <von.hasseln@dbag.stg.daimlerbenz.com> Organization: Daimler Benz AG, Research and Technology To: patrick.canarelli@filnet.fr Subject: DCG

Dear Patrick,

I did some work on directed cyclic graphs. What I did was to extend an old algorithm from statistics (iterative proportional fitting - IPF) to include conditional probabilities. This algorithm can be used to 'learn' cyclic structures.

If you are interested I would be glad to send you the paper.

Regards, Hermann

Dr.Hermann von Hasseln Daimler Benz AG Research and Technology FT2/EK HPC:T721 D-70546 Stuttgart Germany Fon: (+711) 17-41081 Fax: (+711) 17-41717 mailto:von.hasseln@dbag.stg.daimlerbenz.com

___________________________________________________________________________

Date: Mon, 15 Jun 1998 12:06:04 +0200 (METDST) From: Milan Studeny <studeny@utia.cas.cz> To: canarelli <patrick.canarelli@filnet.fr> Cc: uai@CS.ORST.EDU Subject: Re: DCG vs DAG in bayesian networks Organization: Institute of Information Theory and Automation (UTIA AV CR)

On Fri, 12 Jun 1998, canarelli wrote and raised two questions:

> 1/is the DAG assumption in BNs absolutely needed or can it be included even > at a high computational cost? While the DAG assumptions is underlined in all > papers I have read on bayesian networks, I never found any clear > justification for it (or have missed it). In my opinion, the DAG assumption (=the restriction to Bayesian networks) is mainly a tradition partially justified by really good properties (=simplicity) of these models.

> 2/has any work already been done in the field of learning such probabilistic > networks from data allowing for DCG (either using bayesian networks or any > other technique)? In this case can anyone give me some pointers to > references?

I am not sure about the specific question of LEARNING more general graphical models. Nevertheless I can give you references to some papers which are devoted to more general graphical models (and related theoretical questions).

General directed (cyclic) graphs were studied by Spirtes (ps7z@andrew.cmu.edu) and Richardson (tsr@stat.washington.edu). See for example the following papers: -P.Spirtes:Directed cyclic graphical representation of feedback models. in Proceeddings of Uncertainty in Artificial Intelligence 1995, 491-498. -T.S.Richardson:A discovery algorithm for directed cyclic graphs. in Proceedings of Uncertainty in Artificial Intelligence 1996, 462-469.

Remco Bouckaert and I (and some other people) have been interested in chain graphs invented earlier by S.L.Lauritzen and N.Wermuth. These graphs allow both undirected and directed edges but are acyclic! You can download several our papers on this topic from my home page http://www.utia.cas.cz/user_data/studeny/studeny_home.html in 'Selected papers' under item 'Chain graphs'. In these papers you can find refrences to other authors.(Warning: I am going to update my home page in near future) For a short look see: -M.Studeny:On separation criterion and recovery algorithm for chain graphs. in Proceedings of Uncertianty in Artificial Intelligence 1996, 509-516.

Finally, Jan Koster (koster@soc.fsw.eur.nl) has developed a general class of reciprocal graphs which include both general directed graphs and chain graphs. See for example: -J.T.A.Koster:Markov properties of non-recursive causal graphs. The Annals of Statistics 24 (1996), 2148-2177.

There are alternative chain graphs studied by Andersson, Madigan an Perlmann which fall in the framework of more general joint-response chain graphs proposed lately by N.Wermuth and D.R.Cox. I think you can ask for more details David Madigan (madigan@stat.washington.edu) who can tell you more about 'learning aspects'. See for example: -S.A.Andersson, D. Madigan,M.D.Perlman:An alternative Markov property for chain graphs. in Proceedings of Uncertainty in Artificial Intelligence 1996. 40-48.

That's all. I would like to apologize to all authors interested in more general graphical models which were not mentioned. Regards from Milan Studeny

_____________________________________________________________________________

Date: Mon, 15 Jun 1998 04:14:59 -0700 (PDT) From: Judea Pearl <judea@CS.UCLA.EDU> To: canarelli <patrick.canarelli@filnet.fr> CC: Milan Studeny <studeny@utia.cas.cz> Subject: Re: DCG vs DAG in bayesian networks

On Fri, 12 Jun 1998, canarelli wrote and raised two questions:

> 1/is the DAG assumption in BNs absolutely needed or can it be included even > at a high computational cost? While the DAG assumptions is underlined in all > papers I have read on bayesian networks, I never found any clear > justification for it (or have missed it).

Directed cyclic graphs have natural meaning as causal graphs, but as carriers of conditional independence, one need to define how independencies are to be read. Causal graphs with either linear functions or discrete variables do carry the independencies dictated by d-seaparation.

Ref Galles, D. & Pearl, J., ``Axioms of Causal Relevance,'' Artificial Intelligence 1997 Special issue on relevance.

Pearl, J. and Dechter, R., ``Identifying independencies in causal graphs with feedback,'' In E. Horvitz and E. F. Jensen (Eds.), Uncertainty in Artificiall Intelligence, Proceedings of the Twelfth Conference\fR, Morgan Kaufmann: San Francisco, CA, 240--246, August 1996. ______________________________________________________________________________ X400-Received: by /PRMD=ca/ADMD=telecom.canada/C=ca/; Relayed; Mon, 15 Jun 1998 8:00:55 UTC-0700 Date: Mon, 15 Jun 1998 8:00:55 UTC-0700 X400-Originator: boerlage@cs.ubc.ca X400-Recipients: non-disclosure:; X400-Content-Type: P2-1984 (2) X400-MTS-Identifier: [/PRMD=ca/ADMD=telecom.canada/C=ca/;980615080055] Content-Identifier: 1819 From: Brent Boerlage <boerlage@cs.ubc.ca> To: patrick.canarelli@filnet.fr Subject: DCG vs DAG in bayesian networks

Patrick Canarelli,

You wrote:

> ... Loops are thus forbiden. ... > ... > Yet, this is a quite restrictive assumption given real world issues. For > instance, in system dynamics one important aspect is the concept of feedback > loops which exactly correpond to local Directed cyclic graphs (DCG). We then > have the following pattern: A->B, B->C, C->A, which is forbiden in BNs. When you asked about loops in BNs, you mentioned that you wanted to use them for system dynamics, in particular, feedback loops.

The most common way of using BNs for system dynamics is to have a separate node to stand for the value of a variable at each particular point in time (discretizing time into "time slices").

In that case, variable X might result in many nodes: X1, X2, ... Xn. For each node you would get a distribution providing the probability that X takes on each of its values at its point in time. Using this representation no loops are required for feedback.

A disadvantage of the above is that it is verbose, and often you want the conditional probabilities tables for the nodes to be constant over time. So the simple solution is a sort of "meta-BN" representation, in which one node is used to represent the variable at all times, and a new type of link is introduced (a "time-delay" link), which is understood to have a parent that is at a different point in time than its child.

This "meta-BN" is just a shorthand for the unrolled version. There have been many papers on the subject, under the name of temporal BN or dynamic BN (although dynamic BN can also mean something quite different), but the only one I have a complete reference for while sitting at this computer is: (sorry to the authors of the other papers)

Provan, Gregory M. and John R. Clarke (1993) "Dynamic network construction and updating techniques for the diagnosis of acute abdominal pain" in IEEE Transactions on Pattern Analysis and Machine Intelligence, 15, 299-307.

You can explore some of these ideas with the Netica computer program, which can represent time-delay links and do the "unrolling" automatically (and can also learn the conditional probabilities). If you download the program (www.norsys.com), try the "Bouncing" example as described in the onscreen Help. Overall the program is pretty smooth and mature, but be warned that for this particular feature it is pretty rough and in its early stages. Regards, Brent

Brent Boerlage boerlage@norsys.com Norsys Software Corp. www.norsys.com

_____________________________________________________________________________ Date: Mon, 15 Jun 1998 21:49:27 +0500 (GMT+0500) From: Ravi C Venkatesan <rcv_dac@giaspn01.vsnl.net.in> To: canarelli <patrick.canarelli@filnet.fr> Cc: uai@CS.ORST.EDU Subject: DCG v/s DAG

Dear Patrick,

As I understand it, when one tries to "learn" a Bayesian situation from data, one is trying to fit a true but unknown probability distribution with an empirically distribution. This in itself does not require the acyclicity assumption. The DAC assumption is a matter of *convenience* that is rather artificially imposed so as to impose some sort of *sequential nature* to the estimation process. If $x_1$ does not depend on anything else, then one can infer $P(x_1)$ directly. If $x_2$ depends only on $x_1$ but nothing else then, having made some sort of educated guess as to $P(x_1)$ and having observed the pairs $(x_1,x_2)$ empirically, then one can estimate $P(x_2|x_1)$, and so on. Surely it should be possible to break this DAC assumption, because I see no mathematically valid reason for imposing (as opposed to mere convenience).

But let me state that this is just an opinion.

Regards,

Ravi Venkatesan

_____________________________________________________________________________

Date: Tue, 16 Jun 1998 15:21:21 +0200 From: "Prof. Frank Lad" <flad@dipmat.unipg.it> To: patrick.canarelli@filnet.fr Subject: from frank lad

Hello Patrick, here is the paper in latex. there is one figure missing since it cannot be constructed with latex. i will be in paris 4-12 july, and can pass it on to you if you are interested. anyhow here comes the paper.

salut, frank

_____________________________________________________________________________

X-Sender: klaskey@osf1.gmu.edu Date: Tue, 16 Jun 1998 12:08:03 -0500 To: canarelli <patrick.canarelli@filnet.fr> From: Kathryn Blackmond Laskey <klaskey@gmu.edu> Subject: Re: DCG vs DAG in bayesian networks

There is a large literature on graphical models, of which Bayesian networks are a subset. Check out the book by Whittaker and the program BUGS which is available over the web.

Kathy Laskey

____________________________________________________________________________

Date: Wed, 17 Jun 1998 10:26:40 +0200 From: "Prof. Frank Lad" <flad@dipmat.unipg.it> To: patrick.canarelli@filnet.fr Subject: from Frank Lad again Cc: flad@netserver.dipmat.unipg.it

Hello Patrick, Let me send you another section to include in your printing of the paper I sent you yesterday. It is another version of the section entitled "Directed Graphs with a Feedback Cycle? ... No Problem!". The reason I send it is that I have learned there is some dissension among the network people concerning exactly what conditional probability equalities are presumed when one makes a directed cyclic graph of this type. Originally I had presumed the equalities that are discussed in the section which follows, that i am sending to you now. However, when I was involved in discussions with Finn Jensen at a meeting in Erice, Sicily, 1997 he told me that I was mistaken in this detail, and he told me the probability equalities implied by a cyclic graph are of the type I analyse and discuss in the paper i've already sent to you. However, someone here in Perugia has told me that some people think the implied equalities are those which I had previously presumed! It is for this reason that I am sending you also the following section of the previous version of my paper. For printing it, I suggest you just insert this "section" of the latex file into the tex file i've already sent you, immediately after the section that appears in it with the same title. anyhow, best wishes, Frank Lad

__________________________________________________________________________

Sender: murphy@crl.dec.com Date: Wed, 17 Jun 1998 14:18:57 -0400 From: Kevin Murphy <murphy@crl.dec.com> To: canarelli <patrick.canarelli@filnet.fr> Subject: Re: DCG vs DAG in bayesian networks References: <B0000524966@mail.filnet.fr>

Thomas Richardson, now at U. Washington, has worked on DCGs. In our group at Berkeley, we are working on DBNs (dynamic BNs: essentially DCGs unrolled in time to eliminate directed cycles): see our UAI98 paper at www.cs.berkeley.edu/~murphyk/publ.html)

Kevin _______________________________________________________________

Patrick Canarelli Managing Director COMPLEX SYSTEMS 149 rue Montmartre F-75002 PARIS

Tel/Fax: (33) 1 42 21 47 86 Email: patrick.canarelli@filnet.fr _______________________________________________________________