LING 83600 Language Technology / CS 84010 Natural Language Processing, CUNY Graduate Center, Fall 2014

Time and Location	T 11:45am-1:45pm, Room 6496
Personnel	Prof. Liang Huang (huang at cs.qc), Instructor James Cross (jcross at gc.cuny), TA
Office Hours	Tuesday afternoons at CS Lab. Additional office hours available before HW dues and exams.
Prerequisites	CS: algorithms and datastructures (especially recursion and dynamic programming). solid at programming (in Python). basic understanding of formal language and automata theory. LING: minimal understanding of morphology, phonolgy, and syntax (we'll review these). MATH: good understanding of basic probability theory.
Textbooks / MOOCs	This course is self-contained (with slides and handouts) but you may find the following textbooks helpful: Jurafsky and Martin, 2009 (2nd edi.), Speech and Language Processing. (default reference) Manning and Schutze. 1999. Foundations of Statistical Natural Language Processing. You might also find these Coursera courses helpful: Jurafsky and Manning (Stanford) Collins (Columbia) -- more mathematical
Grading	Homework: 48%. programming exercises in Python + pen-n-paper exercises late penalty: you can submit only one (1) HWs late (by 48 hours). Quiz: 7% ~~Final Project: 5 (proposal) + 5 (talk) +15 (report) = 25%. individually or in teams of 1-3 students.~~ Exercises: 5+5=10%. graded by completeness, not correctness. Class Participation: 10% asking/answering questions in class; helping peers on HWs (5%) catching/fixing bugs in slides/exams/hw & other suggestions (2%) reward for submitting no HW late (3%)

Tentative Schedule:

Week	Date	Topics	Homework/Quiz
1	Sep 2	Intro to NLP and Rudiments of linguistic theory Intro to Python for text processing	Ex0
Unit 1: Sequence Models and Noisy-Channel: Morphology, Phonology
2	Sep 9	Basic automata theory. FSA (DFA/NFA) and FST.
3	Sep 16	FSAs/FSTs cont'd The Noisy-channel model.	HW1 out: FSA/FSTs, carmel; recovering vowels
4	~~Sep 23~~	RELIGIOUS HOLIDAY - NO CLASS
5	Sep 30	hw1 discussions SVO/SOV vs. infix/postfix; adv of SVO: less case-marking; adv of SOV: no attachment ambiguity simple pluralizer language model: basic smoothing: Laplacian, Witten-Bell, Good-Turing	Quiz 0 ex1 out
6	Oct 7	language model (cont'd): information theory, entropy and perplexity, Shannon game Viterbi decoding for HMM; transliteration	hw2 out: English pronunciation, Japanese transliteration
7	Oct 14	Pluralizer demo; discussions of HW2. More on HMM/Viterbi; sample code. intro to HW3 (semi-markov).	hw3 out: decoding for Japanese transliteration
Unit 2: Unsupervised Learning for Sequences: Transliteration and Translation
8	Oct 21	Korean vs. Japanese writing systems. More on semi-markov Viterbi. EM for transliteration.
9	Oct 28	More on EM: forward-backward and theory	hw4 out: EM for transliteration.
10	Nov 4	Machine Translation: IBM Models 1-2
11	Nov 11	EM for IBM Model 1
12	Nov 18	EM/HMM demo from Jason Eisner Pointwise mutual information vs. IBM model 1 and IBM model 4
Unit 3: Tree Models: Syntax, Parsing, and Semantics
13	Nov 25	CFG and CKY	hw5 out: IBM model 1
14	Dec 2	semantics intro; entailment; upward and downward monotonicity.
15	Dec 9 (last class)	compositional semantics: quantifiers, type raising.	hw6 out: parsing

Other NLP/CL courses:

Spring 2013 version (changes in 2014: +EM, +semantics, less CFGs, less prob theory, project now 1-3 ppl).
The one I taught at USC: 2011, 2010, and 2009.
Many slides are reused from those versions.
Previous offerings of this course at CUNY (by Matt Huenerfauth): 2012, 2011, and 2010.
Very similar in terms of topics, but my HWs are heavier in programming.
Jason Eisner (JHU) -- slides highly recommended.
Michael Collins (Columbia) and also on Coursera.
Chris Manning and Dan Jurafsky (Stanford).

Reading List

Liang Huang

Last modified: Fri Mar 15 18:03:42 EDT 2013