CS539 Program 5 Due Monday March 6

In this assignment, you will implement Max propagation and then use it to implement the Hard EM algorithm. The code for this assignment is available here as a tar file.

The command line arguments for jtdriver have been extensively changed:

Usage: jtdriver [-v] [-l trainingdata] [-t testdata] -n networkfile
 -v                       turn on trace output
 -l trainingdata          file of training examples
 -t testdata              file of test examples
 -n networkfile           file describing belief network
Also, the interactive commands have been changed:
Observe (o), query (q), MPE (m), reset (r), 
learn (EM) (l), learn (hard EM) (h),
trace (t), untrace(u), randomize (z)  or exit (x): 
Here is an explanation:

Max Propagation

I suggest that you first implement Max propagation and test it on the simple max.net, which is the example that I used on the slides. After max propagation (MPE), the junction tree for this network should look like this:

C(2 1) --- S(1) --- C(1 0) 

C(2 1) 
(2 1)
0.06 0.28 0.24 0.12 

S(1)
(1)
0.24 0.28 

C(1 0) 
(1 0)
0.24 0.16 0.07 0.28 
From which we can see that the most likely configuration has probability 0.28 and corresponds to variable 2 = 0, variable 1 = 1, and variable 0 = 1.

You will need to write the following routines:

The EM Implementation

I am supplying you with an implementation of EM. I have tried to make the implementation very general. It works by using the conditional probability tables (CPTs) of the belief network as the "counters" in which to collect the statistics needed for the M step of EM. The main loop of EM applies each training example as evidence to the junction tree, computes P(H|O) for each hidden variable H given the observed variables O, and then increments the appropriate entries of the belief network CPTs. In more detail, the process works as follows.

Each training example consists of a vector of values, one for every node in the belief network (hidden or not). If the value is -1, this indicates that the value is missing, otherwise the value is observed, and it should be either 0 or 1. Different training examples may have different values missing. The code can handle this easily.

The main loop of EM (junctionTree::EMIteration) applies each example as evidence and then performs Hugin propagation. After Hugin propagation, each cluster node of the junction tree, for example, C(A,B,C), will represent the (unnormalized) joint distribution of some set of variables P(A,B,C,e) and the evidence e. Now suppose that there is a node in the belief network whose conditional probability table is P(A|B). The algorithm marginalizes away C from P(A,B,C,e) to get P(A,B,e), normalizes this table to get P(A,B), and then adds this probability table to the CPT of the node P(A|B). Hence, during learning, the CPT represents counts of the form tildeP(A,B). If A and B were both observed, then the CPT will contain only zeros and ones. But if one (or both) of the variables was unobserved, then the table will contain fractional "counts", as it should in EM. The code loops through every node of the belief network, finds the corresponding cluster node in the junction tree (by following the pointer jtcontainer), marginalizes a copy of cluster node table, and adds it into the CPT. The junction tree is then reset, and the next example is processed.

After all of the training examples have been processed in this way, the program loops through the nodes of the belief network and normalizes them to convert them from joint distributions into conditional probability distributions. This completes the M step of EM. The code then re-computes the tables in the junction tree from the newly-computed tables of the belief network. This is done by bayesNetwork::updateJT. To make it efficient, we first fill all of the probability tables of the junction tree with 1's, and then loop through the nodes of the belief network and have each node multiply its CPT into the corresponding cluster node in the belief network.

The code also computes the log likelihood of the training data. The computation is quite similar to the E step of EM. Each training example is entered into the junction tree as evidence, and then Hugin propagation is performed. This makes each node into a joint distribution, for example, P(A,B,C,e). By marginalizing away all of the variables in the table, we get P(e). This gives the likelihood of the evidence.

After each iteration of EM, the code prints the log likelihood of the training data and the test data.

The starting point for EM is determined by the initial belief network which is read into the program. The file nb.net is the true model that was used to generate the data. The file nb.random.net is a randomly initialized belief network. I provide this so that during debugging, you can have a repeatable initial state. The files nb.train.data and nb.test.data are the training and testing data sets.

You can also generate a random starting point for the EM by using the "z" command to the jtdriver program. This will randomize the belief network and then recompute the probability tables in the junction tree from the belief network.

Hard EM

The code for Hard EM should be exactly analogous to the code for EM except that you will replace the call to propagate with a call to maxPropagate and you should replace the call to accumulateEStep with accumulateHardEStep. In accumulateHardEStep, you will need to invoke two new routines from ptable.C:

Experiments to Perform

Turn in your revised versions of jt.C and bayes.C and the results of your experiments both as email and as hardcopy.