# CS539 Program 5 Due Monday March 6

In this assignment, you will implement Max propagation and then use it to implement the Hard EM algorithm. The code for this assignment is available here as a tar file.

The command line arguments for `jtdriver` have been extensively changed:

```Usage: jtdriver [-v] [-l trainingdata] [-t testdata] -n networkfile
-v                       turn on trace output
-l trainingdata          file of training examples
-t testdata              file of test examples
-n networkfile           file describing belief network
```
Also, the interactive commands have been changed:
```Observe (o), query (q), MPE (m), reset (r),
learn (EM) (l), learn (hard EM) (h),
trace (t), untrace(u), randomize (z)  or exit (x):
```
Here is an explanation:
• Observe: observe the value of one of the variables
• Query: query the value of one of the variables
• MPE: compute most probable explanation (you must implement)
• Reset: reset the belief net and junction tree to their "saved versions". New versions are saved at the following times: (a) after reading in the initial network, (b) after randomizing the network, and (c) after running a learning algorithm (EM or HardEM).
• Learn: Run the EM algorithm on the training data until convergence.
• hard EM learn: You must implement this. Run Hard EM.
• Trace: set the global TRACE to 1.
• Untrace: set the global TRACE to 0.
• Randomize: fill the belief network with random values and then use it to recompute the corresponding junction tree tables. This is useful for generating new starting points for EM and HardEM.

#### Max Propagation

I suggest that you first implement Max propagation and test it on the simple `max.net`, which is the example that I used on the slides. After max propagation (MPE), the junction tree for this network should look like this:

```C(2 1) --- S(1) --- C(1 0)

C(2 1)
(2 1)
0.06 0.28 0.24 0.12

S(1)
(1)
0.24 0.28

C(1 0)
(1 0)
0.24 0.16 0.07 0.28
```
From which we can see that the most likely configuration has probability 0.28 and corresponds to variable 2 = 0, variable 1 = 1, and variable 0 = 1.

You will need to write the following routines:

• `void jtnode::maxPropagation()` This is analogous to `huginPropagation`.

• `probabilityTable * jtnode::collectMax(jtnode * caller)` This is analogous to `collectEvidence`.

• `void jtnode::distributeMax(jtnode * caller, probabilityTable * incomingMessage)` This is analogous to `distributeEvidence`.

• `jtnode * junctionTree::maxPropagate()` This is analogous to `propagate`.

#### The EM Implementation

I am supplying you with an implementation of EM. I have tried to make the implementation very general. It works by using the conditional probability tables (CPTs) of the belief network as the "counters" in which to collect the statistics needed for the M step of EM. The main loop of EM applies each training example as evidence to the junction tree, computes P(H|O) for each hidden variable H given the observed variables O, and then increments the appropriate entries of the belief network CPTs. In more detail, the process works as follows.

Each training example consists of a vector of values, one for every node in the belief network (hidden or not). If the value is -1, this indicates that the value is missing, otherwise the value is observed, and it should be either 0 or 1. Different training examples may have different values missing. The code can handle this easily.

The main loop of EM (`junctionTree::EMIteration`) applies each example as evidence and then performs Hugin propagation. After Hugin propagation, each cluster node of the junction tree, for example, C(A,B,C), will represent the (unnormalized) joint distribution of some set of variables P(A,B,C,e) and the evidence e. Now suppose that there is a node in the belief network whose conditional probability table is P(A|B). The algorithm marginalizes away C from P(A,B,C,e) to get P(A,B,e), normalizes this table to get P(A,B), and then adds this probability table to the CPT of the node P(A|B). Hence, during learning, the CPT represents counts of the form tildeP(A,B). If A and B were both observed, then the CPT will contain only zeros and ones. But if one (or both) of the variables was unobserved, then the table will contain fractional "counts", as it should in EM. The code loops through every node of the belief network, finds the corresponding cluster node in the junction tree (by following the pointer `jtcontainer`), marginalizes a copy of cluster node table, and adds it into the CPT. The junction tree is then reset, and the next example is processed.

After all of the training examples have been processed in this way, the program loops through the nodes of the belief network and normalizes them to convert them from joint distributions into conditional probability distributions. This completes the M step of EM. The code then re-computes the tables in the junction tree from the newly-computed tables of the belief network. This is done by `bayesNetwork::updateJT`. To make it efficient, we first fill all of the probability tables of the junction tree with 1's, and then loop through the nodes of the belief network and have each node multiply its CPT into the corresponding cluster node in the belief network.

The code also computes the log likelihood of the training data. The computation is quite similar to the E step of EM. Each training example is entered into the junction tree as evidence, and then Hugin propagation is performed. This makes each node into a joint distribution, for example, P(A,B,C,e). By marginalizing away all of the variables in the table, we get P(e). This gives the likelihood of the evidence.

After each iteration of EM, the code prints the log likelihood of the training data and the test data.

The starting point for EM is determined by the initial belief network which is read into the program. The file `nb.net` is the true model that was used to generate the data. The file `nb.random.net` is a randomly initialized belief network. I provide this so that during debugging, you can have a repeatable initial state. The files `nb.train.data` and `nb.test.data` are the training and testing data sets.

You can also generate a random starting point for the EM by using the "z" command to the `jtdriver` program. This will randomize the belief network and then recompute the probability tables in the junction tree from the belief network.

#### Hard EM

The code for Hard EM should be exactly analogous to the code for EM except that you will replace the call to `propagate` with a call to `maxPropagate` and you should replace the call to `accumulateEStep` with `accumulateHardEStep`. In `accumulateHardEStep`, you will need to invoke two new routines from `ptable.C`:

• `maxOver(vs)`. This routine is analogous to `sumOver`, except that it computes the maximum marginal. You can experiment with it using the separate driver program "maxover". You will also need it for implementing max propagation above.

• `maximize(vs)`. This routine is analogous to `normalize`, except that it forces the probability of the most likely cell to be 1.0 and the probability of all of the other cells to be 0.0.

#### Experiments to Perform

• Implement Max propagation and apply it to the belief network `car.net` after observing that the car will not start (variable 13 = 1). Show the most likely explanation for the car not starting (i.e., the most likely configuation of the resulting network) and its probability.

• Run EM on the naive bayes network for the given random starting point `nb.random.net` and for 4 different random starting points and plot the training and testing log likelihood at convergence.

• In the code, there is a comment that tells how to impose a Dirichlet prior. Add a Dirichlet prior of 1.0. Rerun EM on the given starting network `nb.random.net` and on four different random starting points and compare the amount of overfitting that you get.

• Implement Hard EM and repeat the same experiments as with EM.
Turn in your revised versions of `jt.C` and `bayes.C` and the results of your experiments both as email and as hardcopy.