Final Exam Study Guide -- CS534 -- Spring 2005

Topics to know for the final exam:

Topics from the first half of the course:

Details of specific hypothesis spaces and learning algorithms.
- You should be able to look at a toy problem in two dimensions and predict how each algorithm will behave (as on the midterm).
- Linear threshold units (what can they express? What can't they express?) Ways of fitting LTUs via: Perceptrons, Logistic regression, multi-variate gaussians, linear programming and quadratic programming (for SVMs)
- Decision trees (including splitting rules and methods of handling missing values)
- Neural networks (including both squared error and softmax error, initialization of the weights)
- Nearest Neighbor (curse of dimensionality)
- Support Vector Machines (including geometric and functional margins, kernel methods, soft margin formulation, margin slack vector, slack vector bound on test set error)
- Naive Bayes (How to compute it for discrete and continuous attributes; Laplace corrections)
Gradient descent search. How to design a gradient descent search algorithm. Difference between batch and incremental (stochastic) gradient descent.

Topics from the second half of the course:

Triple tradeoff (sample size, hypothesis complexity, error rate) and the problem of overfitting.
Bias/variance/noise decomposition. You should be able to analyze new and existing algorithms and predict whether they will have high bias, high variance, or both. You should know how various ensemble methods affect bias and variance.
Methods for controlling overfitting
- Penalty methods: pessimistic tree pruning, cost-complexity pruning, and MDL pruning, SVM margin maximization, weight decay and weight elimination.
- Holdout methods: cross-validation, validation sets, early stopping, reduce-error pruning.
- Ensemble methods: bagging, boosting, bayesian model averaging.
Methods for Evaluating Classifiers and Learning Algorithms
- ROC curves
- Rejection curves
- Precision/recall curves
- McNemar's test and the 5x2cv F test
Sequential Supervised Learning Problems
- Difference between sequential problems and standard supervised learning; sliding window methods for converting one into the other.
- Difference between what the Viterbi algorithm computes and what the Forward-Backward algorithm computes.
- What is the form of the hypothesis computed by the averaged perceptron, conditional random field, and hidden markov model? I won't ask for the details of the fitting procedure or the details of the computations performed by Viterbi and Forward-Backward.
Miscellaneous topics:
- Methodology for making classifier design decisions (e.g., using a development subset of the training data)