| Time and Location | T 11:45am-1:45pm, Room 6496 |
| Personnel |
Prof. Liang Huang (huang at cs.qc), Instructor James Cross (jcross at gc.cuny), TA |
| Office Hours | Tuesday afternoons at CS Lab. Additional office hours available before HW dues and exams. |
| Prerequisites | CS: algorithms and datastructures (especially recursion and dynamic programming).
solid at programming (in Python). basic understanding of formal language and automata theory. LING: minimal understanding of morphology, phonolgy, and syntax (we'll review these). MATH: good understanding of basic probability theory. |
| Textbooks / MOOCs | This course is self-contained (with slides and handouts) but you may find the following textbooks helpful:
You might also find these Coursera courses helpful:
|
| Grading |
|
| Week | Date | Topics | Homework/Quiz |
| 1 | Sep 2 | Intro to NLP and Rudiments of linguistic theory Intro to Python for text processing | Ex0 |
| Unit 1: Sequence Models and Noisy-Channel: Morphology, Phonology | |||
| 2 | Sep 9 | Basic automata theory. FSA (DFA/NFA) and FST. | |
| 3 | Sep 16 | FSAs/FSTs cont'd
The Noisy-channel model. | HW1 out: FSA/FSTs, carmel; recovering vowels |
| RELIGIOUS HOLIDAY - NO CLASS | |||
| 5 | Sep 30 |
hw1 discussions
SVO/SOV vs. infix/postfix; adv of SVO: less case-marking; adv of SOV: no attachment ambiguity simple pluralizer language model: basic smoothing: Laplacian, Witten-Bell, Good-Turing | Quiz 0
ex1 out |
| 6 | Oct 7 |
language model (cont'd): information theory, entropy and perplexity, Shannon game
Viterbi decoding for HMM; transliteration | hw2 out: English pronunciation, Japanese transliteration |
| 7 | Oct 14 |
Pluralizer demo; discussions of HW2. More on HMM/Viterbi; sample code. intro to HW3 (semi-markov). | hw3 out: decoding for Japanese transliteration |
| Unit 2: Unsupervised Learning for Sequences: Transliteration and Translation | |||
| 8 | Oct 21 |
Korean vs. Japanese writing systems. More on semi-markov Viterbi. EM for transliteration. | |
| 9 | Oct 28 |
More on EM: forward-backward and theory | hw4 out: EM for transliteration. |
| 10 | Nov 4 | Machine Translation: IBM Models 1-2 | |
| 11 | Nov 11 | EM for IBM Model 1 | |
| 12 | Nov 18 |
EM/HMM demo from Jason Eisner
Pointwise mutual information vs. IBM model 1 and IBM model 4 |
|
| Unit 3: Tree Models: Syntax, Parsing, and Semantics | |||
| 13 | Nov 25 | CFG and CKY | hw5 out: IBM model 1 |
| 14 | Dec 2 | semantics intro; entailment; upward and downward monotonicity. | |
| 15 | Dec 9 (last class) | compositional semantics: quantifiers, type raising. | hw6 out: parsing |