Relaxing the IID assumption

Helge Langseth (Helge.Langseth@indman.sintef.no)
Tue, 12 Jan 1999 16:19:10 +0100

Dear Colleagues,

When we build Bayesian networks from data, we usually assume that the
observations are identically and independently distributed given the
generating model. I am currently investigating a dataset where the data is
typically not identically distributed.

The data I want to analyze is from continuos monitoring of a chemical
process. To be able to intervene when the process is about to become
unstable, I've generated a dynamic BN from the observed data to better
understand the system. An interesting result is that some variables, which
are conditionally independent when the process is under control, are not
conditionally independent when the system is unstable. That is, the
generating model of two different time slices are generally not identical.
Furthermore, it seems like the absence/presence of these distinguished
correlations have more predictive power than the given state the process is
in, i.e. the information "X1 _||_ X2 | X3" is more valuable than knowing
(X1_x1, X2=x2, ...).

What I want to do is to build a graphical model e.g. based on Meila &
Jordan's ideas presented in "Estimating Dependency Structure as a Hidden
Variable" - NIPS'97 (or the Microsoft-group's extension of that paper).
However, I need a continuos mixture parameter, because the process is
continuos - it drifts out of its stable situation (as I assume it will not
make discrete jumps). Although this assumption was initially intended to be
simplifying, I have now realized that the continuity condition gives me
some trouble. I have left the class of definite mixture models and entered
the hierarchical models. Thus, Meila&Jordan's process won't do the job in
my case, as I have an unbounded number of models to build. It is important
to explicitly maintain the model structure at all times, mainly as a tool
for communicating with the domain experts, but also for process
visualization.

Several classes of models are possible in this situations, all making some
kind of assumption. For example:

o "Fading" the observations (i.e. giving the weight according to their
age), and use e.g. Friedman&Goldszmidt's method for "Sequential update of
Bayesian network structure"

o Partition of the time axis into k (hopefully internally homogenous)
groups, and use e.g. KL to force the different models to be similar if this
assumption is not strongly opposed by the data.

o Having a hidden continuos variable and then use the "Context-Specific
Independence" of Boutlier et.al from UAI'96. (Not a very good idea as I
have lots of discrete variables in my model)

o Trying to express each p_ij as a function of some fixed parameters and
a time-evolving hidden variable in the spirit of a HMM, then at a given
time use e.g. the Chow-three method to extract a graphical model.
(Unfortunately this converges very slowly even for very small models)

o Extend Zweig&Russell's "Compositional modeling with DPNs"

o etc.

As I am not that experienced with the graphical models yet, I'd highly
appreciate comments on these or other implementation schemes. Is this
problem solved by anyone out there? Has anyone evaluated similar models in
similar situations?

All comments are most welcome!

Regards
Helge Langseth