Midterm Study Guide -- CS534 -- Spring 2005
Topics to know for the midterm:
- Situations in which machine learning is useful.
- Definitions of terminology: training examples, features, classes, hypotheses,
hypothesis classes, loss functions, adjustable parameters, VC dimension.
- Decision theory: How to use a loss function to decide what decision to make in order
to minimize expected loss. How to handle reject options.
- Three main kinds of hypotheses: decision boundaries, conditional models P(y|X), and
joint models P(X,y).
- How to make classification decisions using each of these.
- Types of hypothesis spaces: Fixed versus variable, stochastic vs. deterministic
Debate about which method is best. Factors disputed in the debate.
- Criteria for off-the-shelf learning algorithms. What does each of them mean?
- Details of specific learning algorithms and hypothesis spaces (type of decision boundary,
learning algorithms, advantages and disadvantages according to
the criteria for "off-the-shelf" learning):
- Linear threshold units (what can they express? What can't they express?)
Ways of fitting LTUs via: LMS, Logistic regression,
multi-variate gaussians, naive bayes (discrete case), linear
programming.
- Decision trees (including splitting rule and methods of handling missing values)
- Neural networks (including both squared error and softmax error, initialization of
the weights)
- Nearest Neighbor (curse of dimensionality)
- Support Vector Machines (kernels, formulation as linear programming)
- Naive Bayes (How to compute it for discrete attributes; Laplace
corrections; Kernel density estimation)
- Computational Learning Theory: Blumer bound for discrete hypothesis space and for
continuous hypothesis space. Estimating the VC dimension by geometric analysis.
- Gradient descent search. How to design a gradient descent search algorithm.
Difference between batch and incremental (stochastic) gradient descent.
- Linear programming. What is the standard form of a linear programming problem?
- Bayesian Learning Theory. What is bayesian model averaging?
What is MAP? How are they related?
- Bias-Variance Analysis. Definitions of Bias, Variance, Noise.
Decomposition for squared error, 0-1 loss. Estimation using
the bootstrap. Ensemble methods: Bagging and Boosting.