Machine Learning (A Geometric Approach), CUNY Graduate Center, Spring 2013

Time and Location M 9:30-11:30am, Room 4419
Personnel Prof. Liang Huang (huang @ cs.qc), Instructor
Kai Zhao (z.kaayy @ gmail), TA
Office Hours
LH: M 11:30-12pm, CS Lab
KZ: M 3-4pm & F 4-5pm, CS Lab
Additional office hours available before exams.
  • CS: algorithms and datastructures (e.g. dynamic programming). solid at programming.
  • Math: good understanding of linear algebra and basic probability theory. good sense of geometric intuitions.

Textbooks Unlike other fields of CS, there isn't a standard textbook in machine learning because, (1) the field is developing so fast that classical textbooks become outdated constantly, and (2) most recent textbooks were written from a statistical or engineering perspective (rather than a CS one), which makes it unnecessarily complicated.

In light of this I recommend (for references only; this course is self-contained):

  • Tom Mitchell (1994). Machine Learning. a classical textbook. CS perspective. an easy read. outdated but still more helpful than most recent ones.
  • Mohri et al (2012). Foundations of Machine Learning. theory perspective. covers more recent advances such as SVMs that weren't covered in Mitchell.
Note that I do not recommend Bishop (2007); it obfuscates things rather than explains them. Many people who have taught machine learning agree with me. However, the figures and illustrations are very pretty in the Bishop book, and I use them in my slides.
  • Homework: 15 x 4 = 60%.
    • theoretical and conceptual problems (please LaTeX your solutions).
    • programming exercises using Python + numpy.
    • late penalty: you can submit two (2) HWs late (by 48 hours each).
  • Final Project: 5 (proposal) + 5 (talk) + 20 (report) = 30%. individually or in pairs.
  • Class Participation: 10%
    • asking/answering questions in class; helping peers on HWs (5%)
    • catching/fixing bugs in slides/exams/hw & other suggestions (2%)
    • reward for submitting less than 2 HWs late (3%)

Machine Learning evolves around the following central question:

  How can we make computers to act without being explicitly programmed and to improve with experience?

In the past decade, machine learning has given us self-driving cars, practical speech recognition, effective web search, accurate spam filters, and a vastly improved understanding of the human genome. Machine learning is so pervasive today that you probably use it dozens of times a day without knowing it.

This course will survey the most important algorithms and techniques in the field of machine learning. The treatment of math will be rigorous, but unlike most other machine learning courses which focus on the statistical or engineering perspective with tons of equations, my course will focus on the geometric intuitions and the algorithmic perspective. I will try my best to visualize every concept.

Tentative Schedule:
1Jan 28Intro to ML
Intro to Python and numpy (Kai)
2Feb 4slides for lectures 2-4: Online Learning: Perceptron and Beyond.

Linear separation; geometric computation of margin.
Augmented space: extra dimension to get rid of bias.
Geometric review of linear algebra (along the way): dot-product (agreement), perpendicular, norm, normal vector, line/hyperplane equation, distance from point to line/hyperplane (margin).

3Feb 11Perceptron Convergence Proof; geometry vs. algebra
Voted and Averaged Perceptron
MIRA (geometric derivation)
4Feb 18President's Day. Class moved to Wednesday.
4W Feb 20 passive-aggressive; p-aggressive MIRA
demo of perceptron vs. aggressive MIRA
multiclass decision boundaries (geometry)
multiclass perceptron
HW1 out: (avg) perceptron, (aggressive) MIRA, feature preprocessing.
dataset: adult-50k.
5Feb 25engineering tips: shuffling, variable learning rate.
feature preprocessing: rescaling/centering, binning, categorical to binary.

slides for lectures 5-6: Kernels and Kernelized Online Learning.

Non-linear features and feature combination; XOR
Polynomial Kernels (distorts angle)

6Mar 4kernelized perceptron in primal and dual (Kai)
polynomial kernel on examples: XOR, circle.
7Mar 11Gaussian kernel (distorts distance); geometric interpretation on the original and feature spaces; connection to instance-based learning (nearest neighbor)
Kernel properties: Mercer, gram matrix, positive semidefinite (eigenvalue); example of a non-kernel

slides for lectures 7-8: SVM and convex optimization.

SVM optimization in primal (without bias).
Lagrangian (for inequality constraints).
KKT conditions => support vectors.
Geometry of Lagrangian and KKT.

HW1 due.
8Mar 18SVM optimization in dual.
Hildreth algorithm (Kai).
kernels; demo.
HW2 out: kernelized perceptron and SVM.
dataset: spambase.
9Mar 25Spring Break

HW2 due on Friday 4/5.

10Apr 1
11Apr 8slides for Lectures 9-10: structured prediction.

From Multiclass to Structured Classification.
Structured Perceptron (Collins 2002).
HMM Tagging.

12Apr 15Dynamic Programming. Viterbi vs. Dijkstra.
Fast Averaging Trick (Daume, 2006).
Proof of Convergence.
Perceptron with Inexact Search: Early Update, Violation-Fixing, LaSO, etc.
(Collins and Roark:2004; Daume and Marcu, 2005; Huang et al 2012)
13Apr 22Unsupervised Learning
project proposal due 4/24.
14Apr 29EM
mixture of Gaussians
HW3 out: structured perceptron w/ exact and inexact search.
15May 6PCA
16May 13
last class
Final Project Midway Presentations
17May 20

Other machine learning courses (including videos):
Reading List (candidates for presentations):

  1. online-learning vs. SVM
  2. generative vs. discrminative
  3. learning logic programs
  4. EM
  5. Metric Learning and Dimensionality Reduction

Liang Huang
Last modified: Sun Mar 17 21:11:22 EDT 2013