Topics to know for the midterm:

- Situations in which machine learning is useful.
- Definitions of terminology: training examples, features, classes, hypotheses, hypothesis classes, loss functions, adjustable parameters, VC dimension.
- Decision theory: How to use a loss function to decide what decision to make in order to minimize expected loss. How to handle reject options.
- Three main kinds of hypotheses: decision boundaries, conditional models P(y|X), and joint models P(X,y).
- How to make classification decisions using each of these.
- Types of hypothesis spaces: Fixed versus variable, stochastic vs. deterministic Debate about which method is best. Factors disputed in the debate.
- Criteria for off-the-shelf learning algorithms. What does each of them mean?
- Details of specific learning algorithms and hypothesis spaces (type of decision boundary,
learning algorithms, advantages and disadvantages according to
the criteria for "off-the-shelf" learning):
- Linear threshold units (what can they express? What can't they express?) Ways of fitting LTUs via: LMS, Logistic regression, multi-variate gaussians, naive bayes (discrete case), linear programming.
- Decision trees (including splitting rule and methods of handling missing values)
- Neural networks (including both squared error and softmax error, initialization of the weights)
- Nearest Neighbor (curse of dimensionality)
- Support Vector Machines (kernels, formulation as linear programming)
- Naive Bayes (How to compute it for discrete attributes; Laplace corrections; Kernel density estimation)

- Computational Learning Theory: Blumer bound for discrete hypothesis space and for continuous hypothesis space. Estimating the VC dimension by geometric analysis.
- Gradient descent search. How to design a gradient descent search algorithm. Difference between batch and incremental (stochastic) gradient descent.
- Linear programming. What is the standard form of a linear programming problem?
- Bayesian Learning Theory. What is bayesian model averaging? What is MAP? How are they related?
- Bias-Variance Analysis. Definitions of Bias, Variance, Noise. Decomposition for squared error, 0-1 loss. Estimation using the bootstrap. Ensemble methods: Bagging and Boosting.