In this assignment, you will implement Max propagation and then use it to implement the Hard EM algorithm. The code for this assignment is available here as a tar file.
The command line arguments for jtdriver
have been
extensively changed:
Usage: jtdriver [-v] [-l trainingdata] [-t testdata] -n networkfile -v turn on trace output -l trainingdata file of training examples -t testdata file of test examples -n networkfile file describing belief networkAlso, the interactive commands have been changed:
Observe (o), query (q), MPE (m), reset (r), learn (EM) (l), learn (hard EM) (h), trace (t), untrace(u), randomize (z) or exit (x):Here is an explanation:
I suggest that you first implement Max propagation and test it on the
simple max.net
, which is the example that I used on the
slides. After max propagation (MPE), the junction tree for this
network should look like this:
C(2 1) --- S(1) --- C(1 0) C(2 1) (2 1) 0.06 0.28 0.24 0.12 S(1) (1) 0.24 0.28 C(1 0) (1 0) 0.24 0.16 0.07 0.28From which we can see that the most likely configuration has probability 0.28 and corresponds to variable 2 = 0, variable 1 = 1, and variable 0 = 1.
You will need to write the following routines:
void jtnode::maxPropagation()
This is analogous to huginPropagation
.
probabilityTable * jtnode::collectMax(jtnode * caller)
This is analogous to collectEvidence
.
void jtnode::distributeMax(jtnode * caller, probabilityTable * incomingMessage)
This is analogous to distributeEvidence
.
jtnode * junctionTree::maxPropagate()
This is analogous to propagate
.
I am supplying you with an implementation of EM. I have tried to make the implementation very general. It works by using the conditional probability tables (CPTs) of the belief network as the "counters" in which to collect the statistics needed for the M step of EM. The main loop of EM applies each training example as evidence to the junction tree, computes P(H|O) for each hidden variable H given the observed variables O, and then increments the appropriate entries of the belief network CPTs. In more detail, the process works as follows.
Each training example consists of a vector of values, one for every node in the belief network (hidden or not). If the value is -1, this indicates that the value is missing, otherwise the value is observed, and it should be either 0 or 1. Different training examples may have different values missing. The code can handle this easily.
The main loop of EM (junctionTree::EMIteration
) applies
each example as evidence and then performs Hugin propagation. After
Hugin propagation, each cluster node of the junction tree, for
example, C(A,B,C), will represent the (unnormalized) joint
distribution of some set of variables P(A,B,C,e) and the
evidence e. Now suppose that there is a node in the belief
network whose conditional probability table is P(A|B). The
algorithm marginalizes away C from P(A,B,C,e) to get
P(A,B,e), normalizes this table to get P(A,B), and then
adds this probability table to the CPT of the node P(A|B).
Hence, during learning, the CPT represents counts of the form
tildeP(A,B). If A and B were both observed, then
the CPT will contain only zeros and ones. But if one (or both) of the
variables was unobserved, then the table will contain fractional
"counts", as it should in EM. The code loops through every node of
the belief network, finds the corresponding cluster node in the
junction tree (by following the pointer jtcontainer
),
marginalizes a copy of cluster node table, and adds it into the CPT.
The junction tree is then reset, and the next example is processed.
After all of the training examples have been processed in this way,
the program loops through the nodes of the belief network and
normalizes them to convert them from joint distributions into
conditional probability distributions. This completes the M step of
EM. The code then re-computes the tables in the junction tree from
the newly-computed tables of the belief network. This is done by
bayesNetwork::updateJT
. To make it efficient, we first
fill all of the probability tables of the junction tree with 1's, and
then loop through the nodes of the belief network and have each node
multiply its CPT into the corresponding cluster node in the belief
network.
The code also computes the log likelihood of the training data. The computation is quite similar to the E step of EM. Each training example is entered into the junction tree as evidence, and then Hugin propagation is performed. This makes each node into a joint distribution, for example, P(A,B,C,e). By marginalizing away all of the variables in the table, we get P(e). This gives the likelihood of the evidence.
After each iteration of EM, the code prints the log likelihood of the training data and the test data.
The starting point for EM is determined by the initial belief network
which is read into the program. The file nb.net
is the
true model that was used to generate the data. The file
nb.random.net
is a randomly initialized belief network.
I provide this so that during debugging, you can have a repeatable
initial state. The files nb.train.data
and
nb.test.data
are the training and testing data sets.
You can also generate a random starting point for the EM by using the
"z" command to the jtdriver
program. This will randomize
the belief network and then recompute the probability tables in the
junction tree from the belief network.
The code for Hard EM should be exactly analogous to the code for EM
except that you will replace the call to propagate
with
a call to maxPropagate
and you should replace the call to
accumulateEStep
with accumulateHardEStep
.
In accumulateHardEStep
, you will need to invoke two new
routines from ptable.C
:
maxOver(vs)
. This routine is analogous to
sumOver
, except that it computes the maximum marginal.
You can experiment with it using the separate driver program
"maxover". You will also need it for implementing max propagation
above.
maximize(vs)
. This routine is analogous to
normalize
, except that it forces the probability of the
most likely cell to be 1.0 and the probability of all of the other
cells to be 0.0.
car.net
after observing that the car will not start
(variable 13 = 1). Show the most likely explanation for the car not
starting (i.e., the most likely configuation of the resulting network)
and its probability.
nb.random.net
and for 4 different random
starting points and plot the training and testing log likelihood at
convergence.
nb.random.net
and on four different
random starting points and compare the amount of overfitting that you
get.
jt.C
and
bayes.C
and the results of your experiments both as email
and as hardcopy.