The Final will cover *everything* in the course, but at least two-thirds
of the exam will cover material that we have discussed since the midterm.
This includes:

- Machine Learning and Pattern Recognition
- Definition of classification learning problem.
- Neural networks:
- weights, bias values, sigmoid functions.
- error function J(S,W), gradient descent search
- stochastic gradient descent, conjugate gradient descent
- measuring error using holdout data.
- the problem of overfitting; solving via early stopping.

- Decision trees:
- structure and how they are executed to classify data points.
- top-down divide-and-conquer method for growing trees.
- Scoring a proposed splitting test using mutual information.
- Pruning using a validation set.
- Converting trees to rules.

- Nearest neighbor method:
- finding the k nearest neighbors.
- importance of the distance metric.
- problems with noisy or irrelevant features.
- finding the nearest neighbor using a kd tree.

- Reinforcement Learning:
- Definition of a reinforcement learning task: states, actions, immediate rewards.
- Definition of policy, optimal policy, and value function.
- Computing the optimal policy from the optimal value function.
- The value iteration algorithm.
- Temporal difference learning using a neural network.

- Diagnosis with Belief Networks
- Diagnosis problem: repair the device while minimizing total average cost of repair.
- Computing optimal policies in the repair-only case with the single-fault assumption (ratio of probability of failure divided by cost of observation).
- Computing optimal policies in the general case by complete analysis of the decision tree (working backwards taking expected values and max's).
- Computing approximately-optimal policies for the case where we have a mix of repairable and purely observable components.

- Probabilistic Reasoning:
- Random variables, expected values, joint distribution, marginal probability, conditional probability.
- Computing marginal and conditional probabilities from the joint distribution.
- Algebra of probability distributions: chain rule, independence, conditional independence.
- Belief networks. What probability distribution is stored at each node. Computing the joint distribution by taking the conformal product of all of the individual distributions.
- Computing probabilities for diagnosis. Implementing the single fault assumption.