Re: Learning BNs with hidden variables

Daniel Nikovski (Daniel_Nikovski@gs147.sp.cs.cmu.edu)
Tue, 06 Apr 1999 17:50:57 -0400

I have had the same experience with the mentioned algorithm of Binder,
Koller, Russell and Kanazawa - it usually converges to local maxima of
the log-likelihood, which are far from optimal. I compared it with EM
on the problem of learning transition probabilities between hidden
state nodes in temporal probabilistic networks - more details can be
found in:

Nikovski, D. (1998). Learning stationary temporal probabilistic
networks. Conference on Automated Learning and Discovery, Pittsburgh,
June 11-13, 1998. http://www.cs.cmu.edu/~danieln/conald.ps

I have also experimented with continuous observation nodes, whose
emission densities are Gaussian. The algorithm of Binder, Koller,
Russell and Kanazawa works very well if the initial values of the
means and standard deviations are sufficiently close to their true
values, but converges to local maxima of log-likelihood when they are
initialized randomly. K-means clustering of observations usually
solves the problem, though, as has been found with learning hidden
Markov models too.

Again, learning in the CPTs of hidden nodes proceeds very slowly,
orders of magnitude slower than learning of emission probabilities,
and often converges to a value different than the true one (although
in its general vicinity).

Cheers,

--daniel

Frank Wittig <fwittig@cs.uni-sb.de> wrote:

>Dear colleagues,
>
>We have been using the Adaptive Probabilistic Networks method
>of Binder, Koller, Russell and Kanazawa (1997) to learn Bayesian
>networks with hidden variables. (The context is the modeling of
>unobservable properties of computer users.) We use
>Polak-Ribi=E8re's method for choosing the direction of the next
>hill-climbing step.
>
>Although the results so far have been reasonable, the learned
>nets seem to be local optima that could be improved upon. Also,
>the algorithm seems to concentrate its learning mainly on the
>CPTs that link a hidden variable with its observable children,
>leaving the CPT for the hidden variable largely unchanged, even
>when this initial CPT does not represent a particularly promising
>choice.
>
>We'd be interested to hear of methods that others have used to
>get good results with this type of algorithm - for example, by
>specifying constraints that the learned network ought to satisfy.
>
>We are also interested in experiences with implementations of
>the EM algorithm for learning Bayesian networks and how these or
>similar problems have been solved in that context.
>
>Thanks in advance,
>
> Frank
>
>Binder, J., Koller, D., Russell, S., & Kanazawa, K. (1997).
> Adaptive probabilistic networks with hidden variables.
> Machine Learning, 29, 213-244.
>
>
>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>Frank Wittig Email: fwittig@cs.uni-sb.de
>University of Saarbruecken WWW: http://w5.cs.uni-sb.de/~fwittig
>P.O. Box 15 11 50 Phone: +49 681 302 4135 =20
>D-66041 Saarbruecken Fax: +49 681 302 4136 =20
>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D