Machine Learning (A Geometric Approach), CUNY Graduate Center, Spring 2013

Time and Location	M 9:30-11:30am, Room 4419
Personnel	Prof. Liang Huang (huang @ cs.qc), Instructor Kai Zhao (z.kaayy @ gmail), TA
Office Hours (tentative)	LH: M 11:30-12pm, CS Lab KZ: M 3-4pm & F 4-5pm, CS Lab Additional office hours available before exams.
Prerequisites	CS: algorithms and datastructures (e.g. dynamic programming). solid at programming. Math: good understanding of linear algebra and basic probability theory. good sense of geometric intuitions.
Textbooks	Unlike other fields of CS, there isn't a standard textbook in machine learning because, (1) the field is developing so fast that classical textbooks become outdated constantly, and (2) most recent textbooks were written from a statistical or engineering perspective (rather than a CS one), which makes it unnecessarily complicated. In light of this I recommend (for references only; this course is self-contained): Tom Mitchell (1994). Machine Learning. a classical textbook. CS perspective. an easy read. outdated but still more helpful than most recent ones. Mohri et al (2012). Foundations of Machine Learning. theory perspective. covers more recent advances such as SVMs that weren't covered in Mitchell. Note that I do not recommend Bishop (2007); it obfuscates things rather than explains them. Many people who have taught machine learning agree with me. However, the figures and illustrations are very pretty in the Bishop book, and I use them in my slides.
Grading	Homework: 15 x 4 = 60%. theoretical and conceptual problems (please LaTeX your solutions). programming exercises using Python + numpy. late penalty: you can submit two (2) HWs late (by 48 hours each). Final Project: 5 (proposal) + 5 (talk) + 20 (report) = 30%. individually or in pairs. Class Participation: 10% asking/answering questions in class; helping peers on HWs (5%) catching/fixing bugs in slides/exams/hw & other suggestions (2%) reward for submitting less than 2 HWs late (3%)

Machine Learning evolves around the following central question:

How can we make computers to act without being explicitly programmed and to improve with experience?

In the past decade, machine learning has given us self-driving cars, practical speech recognition, effective web search, accurate spam filters, and a vastly improved understanding of the human genome. Machine learning is so pervasive today that you probably use it dozens of times a day without knowing it.

This course will survey the most important algorithms and techniques in the field of machine learning. The treatment of math will be rigorous, but unlike most other machine learning courses which focus on the statistical or engineering perspective with tons of equations, my course will focus on the geometric intuitions and the algorithmic perspective. I will try my best to visualize every concept.

Tentative Schedule:

Week	Date	Topics	Homework
1	Jan 28	Intro to ML Intro to Python and numpy (Kai)
2	Feb 4	slides for lectures 2-4: Online Learning: Perceptron and Beyond. Linear separation; geometric computation of margin. Augmented space: extra dimension to get rid of bias. Perceptron. Geometric review of linear algebra (along the way): dot-product (agreement), perpendicular, norm, normal vector, line/hyperplane equation, distance from point to line/hyperplane (margin).
3	Feb 11	Perceptron Convergence Proof; geometry vs. algebra Voted and Averaged Perceptron MIRA (geometric derivation)
4	~~Feb 18~~	President's Day. Class moved to Wednesday.
4	W Feb 20	passive-aggressive; p-aggressive MIRA demo of perceptron vs. aggressive MIRA multiclass decision boundaries (geometry) multiclass perceptron	HW1 out: (avg) perceptron, (aggressive) MIRA, feature preprocessing. dataset: adult-50k.
5	Feb 25	engineering tips: shuffling, variable learning rate. feature preprocessing: rescaling/centering, binning, categorical to binary. slides for lectures 5-6: Kernels and Kernelized Online Learning. Non-linear features and feature combination; XOR Polynomial Kernels (distorts angle)
6	Mar 4	kernelized perceptron in primal and dual (Kai) polynomial kernel on examples: XOR, circle.
7	Mar 11	Gaussian kernel (distorts distance); geometric interpretation on the original and feature spaces; connection to instance-based learning (nearest neighbor) Kernel properties: Mercer, gram matrix, positive semidefinite (eigenvalue); example of a non-kernel slides for lectures 7-8: SVM and convex optimization. SVM optimization in primal (without bias). Lagrangian (for inequality constraints). KKT conditions => support vectors. Geometry of Lagrangian and KKT.	HW1 due.
8	Mar 18	SVM optimization in dual. Hildreth algorithm (Kai). soft-margin. kernels; demo.	HW2 out: kernelized perceptron and SVM. dataset: spambase.
9	~~Mar 25~~	Spring Break HW2 due on Friday 4/5.
10	~~Apr 1~~	Spring Break HW2 due on Friday 4/5.
11	Apr 8	slides for Lectures 9-10: structured prediction. From Multiclass to Structured Classification. Structured Perceptron (Collins 2002). HMM Tagging.
12	Apr 15	Dynamic Programming. Viterbi vs. Dijkstra. Fast Averaging Trick (Daume, 2006). Proof of Convergence. Perceptron with Inexact Search: Early Update, Violation-Fixing, LaSO, etc. (Collins and Roark:2004; Daume and Marcu, 2005; Huang et al 2012)
13	Apr 22	Unsupervised Learning k-means	project proposal due 4/24.
14	Apr 29	EM mixture of Gaussians	HW3 out: structured perceptron w/ exact and inexact search. data.
15	May 6	PCA ICA
16	May 13 last class	Final Project Midway Presentations
17	May 20

Other machine learning courses (including videos):

Pedro Domingos (U. Washington) (on coursera) (on UW)
Andrew Ng (Stanford) (on coursera) (on youtube)
Tom Mitchell (CMU)

Reading List (candidates for presentations):

online-learning vs. SVM
- Pegasos
- Gentile (2001): A New Approximate Maximal Margin Classification Algorithm (ALMA)
- Li and Long (2001): The Relaxed Online Maxiumum Margin Algorithm (ROMMA)
- Perceptron with Uneven Margins
- Ballseptron
- Forgetron
generative vs. discrminative
- Percy Liang et al
learning logic programs
- Unsupervised Semantic Parsing
EM
- Online EM
- EM with two maximizations
Metric Learning and Dimensionality Reduction

Liang Huang

Last modified: Sun Mar 17 21:11:22 EDT 2013