Midterm Study Guide -- CS534 -- Spring 2005

Topics to know for the midterm:

Situations in which machine learning is useful.
Definitions of terminology: training examples, features, classes, hypotheses, hypothesis classes, loss functions, adjustable parameters, VC dimension.
Decision theory: How to use a loss function to decide what decision to make in order to minimize expected loss. How to handle reject options.
Three main kinds of hypotheses: decision boundaries, conditional models P(y|X), and joint models P(X,y).
How to make classification decisions using each of these.
Types of hypothesis spaces: Fixed versus variable, stochastic vs. deterministic Debate about which method is best. Factors disputed in the debate.
Criteria for off-the-shelf learning algorithms. What does each of them mean?
Details of specific learning algorithms and hypothesis spaces (type of decision boundary, learning algorithms, advantages and disadvantages according to the criteria for "off-the-shelf" learning):
- Linear threshold units (what can they express? What can't they express?) Ways of fitting LTUs via: LMS, Logistic regression, multi-variate gaussians, naive bayes (discrete case), linear programming.
- Decision trees (including splitting rule and methods of handling missing values)
- Neural networks (including both squared error and softmax error, initialization of the weights)
- Nearest Neighbor (curse of dimensionality)
- Support Vector Machines (kernels, formulation as linear programming)
- Naive Bayes (How to compute it for discrete attributes; Laplace corrections; Kernel density estimation)
Computational Learning Theory: Blumer bound for discrete hypothesis space and for continuous hypothesis space. Estimating the VC dimension by geometric analysis.
Gradient descent search. How to design a gradient descent search algorithm. Difference between batch and incremental (stochastic) gradient descent.
Linear programming. What is the standard form of a linear programming problem?
Bayesian Learning Theory. What is bayesian model averaging? What is MAP? How are they related?
Bias-Variance Analysis. Definitions of Bias, Variance, Noise. Decomposition for squared error, 0-1 loss. Estimation using the bootstrap. Ensemble methods: Bagging and Boosting.