In this assignment, you will implement Max propagation and then use it to implement the Hard EM algorithm. The code for this assignment is available here as a tar file.

The command line arguments for `jtdriver`

have been
extensively changed:

Usage: jtdriver [-v] [-l trainingdata] [-t testdata] -n networkfile -v turn on trace output -l trainingdata file of training examples -t testdata file of test examples -n networkfile file describing belief networkAlso, the interactive commands have been changed:

Observe (o), query (q), MPE (m), reset (r), learn (EM) (l), learn (hard EM) (h), trace (t), untrace(u), randomize (z) or exit (x):Here is an explanation:

- Observe: observe the value of one of the variables
- Query: query the value of one of the variables
- MPE: compute most probable explanation (you must implement)
- Reset: reset the belief net and junction tree to their "saved versions". New versions are saved at the following times: (a) after reading in the initial network, (b) after randomizing the network, and (c) after running a learning algorithm (EM or HardEM).
- Learn: Run the EM algorithm on the training data until convergence.
- hard EM learn: You must implement this. Run Hard EM.
- Trace: set the global TRACE to 1.
- Untrace: set the global TRACE to 0.
- Randomize: fill the belief network with random values and then use it to recompute the corresponding junction tree tables. This is useful for generating new starting points for EM and HardEM.

I suggest that you first implement Max propagation and test it on the
simple `max.net`

, which is the example that I used on the
slides. After max propagation (MPE), the junction tree for this
network should look like this:

C(2 1) --- S(1) --- C(1 0) C(2 1) (2 1) 0.06 0.28 0.24 0.12 S(1) (1) 0.24 0.28 C(1 0) (1 0) 0.24 0.16 0.07 0.28From which we can see that the most likely configuration has probability 0.28 and corresponds to variable 2 = 0, variable 1 = 1, and variable 0 = 1.

You will need to write the following routines:

`void jtnode::maxPropagation()`

This is analogous to`huginPropagation`

.`probabilityTable * jtnode::collectMax(jtnode * caller)`

This is analogous to`collectEvidence`

.`void jtnode::distributeMax(jtnode * caller, probabilityTable * incomingMessage)`

This is analogous to`distributeEvidence`

.`jtnode * junctionTree::maxPropagate()`

This is analogous to`propagate`

.

I am supplying you with an implementation of EM. I have tried to make
the implementation very general. It works by using the conditional
probability tables (CPTs) of the belief network as the "counters" in
which to collect the statistics needed for the M step of EM. The main
loop of EM applies each training example as evidence to the junction
tree, computes *P(H|O)* for each hidden variable *H* given
the observed variables *O*, and then increments the appropriate
entries of the belief network CPTs. In more detail, the process works
as follows.

Each training example consists of a vector of values, one for every node in the belief network (hidden or not). If the value is -1, this indicates that the value is missing, otherwise the value is observed, and it should be either 0 or 1. Different training examples may have different values missing. The code can handle this easily.

The main loop of EM (`junctionTree::EMIteration`

) applies
each example as evidence and then performs Hugin propagation. After
Hugin propagation, each cluster node of the junction tree, for
example, *C(A,B,C)*, will represent the (unnormalized) joint
distribution of some set of variables *P(A,B,C,e)* and the
evidence *e*. Now suppose that there is a node in the belief
network whose conditional probability table is *P(A|B)*. The
algorithm marginalizes away *C* from *P(A,B,C,e)* to get
*P(A,B,e)*, normalizes this table to get *P(A,B)*, and then
adds this probability table to the CPT of the node *P(A|B)*.
Hence, during learning, the CPT represents counts of the form
*tildeP(A,B)*. If *A* and *B* were both observed, then
the CPT will contain only zeros and ones. But if one (or both) of the
variables was unobserved, then the table will contain fractional
"counts", as it should in EM. The code loops through every node of
the belief network, finds the corresponding cluster node in the
junction tree (by following the pointer `jtcontainer`

),
marginalizes a copy of cluster node table, and adds it into the CPT.
The junction tree is then reset, and the next example is processed.

After all of the training examples have been processed in this way,
the program loops through the nodes of the belief network and
normalizes them to convert them from joint distributions into
conditional probability distributions. This completes the M step of
EM. The code then re-computes the tables in the junction tree from
the newly-computed tables of the belief network. This is done by
`bayesNetwork::updateJT`

. To make it efficient, we first
fill all of the probability tables of the junction tree with 1's, and
then loop through the nodes of the belief network and have each node
multiply its CPT into the corresponding cluster node in the belief
network.

The code also computes the log likelihood of the training data. The
computation is quite similar to the E step of EM. Each training
example is entered into the junction tree as evidence, and then Hugin
propagation is performed. This makes each node into a joint
distribution, for example, *P(A,B,C,e)*. By marginalizing away
*all* of the variables in the table, we get *P(e)*. This
gives the likelihood of the evidence.

After each iteration of EM, the code prints the log likelihood of the training data and the test data.

The starting point for EM is determined by the initial belief network
which is read into the program. The file `nb.net`

is the
true model that was used to generate the data. The file
`nb.random.net`

is a randomly initialized belief network.
I provide this so that during debugging, you can have a repeatable
initial state. The files `nb.train.data`

and
`nb.test.data`

are the training and testing data sets.

You can also generate a random starting point for the EM by using the
"z" command to the `jtdriver`

program. This will randomize
the belief network and then recompute the probability tables in the
junction tree from the belief network.

The code for Hard EM should be exactly analogous to the code for EM
except that you will replace the call to `propagate`

with
a call to `maxPropagate`

and you should replace the call to
`accumulateEStep`

with `accumulateHardEStep`

.
In `accumulateHardEStep`

, you will need to invoke two new
routines from `ptable.C`

:

`maxOver(vs)`

. This routine is analogous to`sumOver`

, except that it computes the maximum marginal. You can experiment with it using the separate driver program "maxover". You will also need it for implementing max propagation above.`maximize(vs)`

. This routine is analogous to`normalize`

, except that it forces the probability of the most likely cell to be 1.0 and the probability of all of the other cells to be 0.0.

- Implement Max propagation and apply it to the belief network
`car.net`

after observing that the car will not start (variable 13 = 1). Show the most likely explanation for the car not starting (i.e., the most likely configuation of the resulting network) and its probability. - Run EM on the naive bayes network for the given random
starting point
`nb.random.net`

and for 4 different random starting points and plot the training and testing log likelihood at convergence. - In the code, there is a comment that tells how to impose a
Dirichlet prior. Add a Dirichlet prior of 1.0. Rerun EM on the given
starting network
`nb.random.net`

and on four different random starting points and compare the amount of overfitting that you get. - Implement Hard EM and repeat the same experiments as with EM.

`jt.C`

and
`bayes.C`

and the results of your experiments both as email
and as hardcopy.