LING 83600/CS 84010, Language Technology (aka Natural Language Processing), CUNY Graduate Center, Spring 2013

Time and Location M 4:15-6:15pm, Room 5383
Personnel Prof. Liang Huang (huang @ cs.qc), Instructor
Jie Chu (jchu1 @ gc.cuny), TA
Office Hours
LH: M 6:15-6:30pm, CS Lab
JC: F 3-4pm, CS Lab
Additional office hours available before HW dues and exams.
Prerequisites CS: algorithms and datastructures (especially recursion and dynamic programming). solid at programming. basic understanding of automata theory.
Math: good understanding of basic probability theory.
Textbooks This course is self-contained (with slides and handouts) but you may find the following textbooks helpful:
  • Jurafsky and Martin, 2009 (2nd edi.), Speech and Language Processing. (default reference)
  • Manning and Schutze. 1999. Foundations of Statistical Natural Language Processing.
  • Homework: 10+15+10+13 = 48%.
    • programming exercises in Python + pen-n-paper exercises
    • late penalty: you can submit two (2) HWs late (by 48 hours each).
  • Quiz: 7%
  • Final Project: 5 (proposal) + 5 (talk) +15 (report) = 25%. individually or in pairs.
  • Exercises: 5+5=10%. graded by completeness, not correctness.
  • Class Participation: 10%
    • asking/answering questions in class; helping peers on HWs (5%)
    • catching/fixing bugs in slides/exams/hw & other suggestions (2%)
    • reward for submitting less than 2 HWs late (3%)

Tentative Schedule:
1Jan 28Intro to NLP and Rudiments of linguistic theory
Intro to Python for text processing
Unit 1: Sequences and Noisy-Channel
2Feb 4Basic automata theory. FSA (DFA/NFA) and FST.
3Feb 11FSAs/FSTs cont'd
The Noisy-channel model.
Quiz 0 (Python and trees)
HW1 out
: FSA/FSTs, carmel.
4Feb 18President's Day. Class moved to Wednesday.
4W Feb 20Probability theory and estimation.
Weighted FSA/FSTs.
Noisy-Channel model.
help on HW1
5Feb 25Language Models and Smoothing; P(Obama), P( | Bush). HW1 due
Jie: Discussions of HW1; minilecture on Unix; hash vs. array.
6Mar 4Smoothing: pseudocounts, prior/MAP; add-(less-than)-one, Witten-Bell, Good-Turing; backoff and interpolationQuiz 0' (trees, stack, postfix/SOV, FSA (pluralizer), hash, binary search).
7Mar 11Entropy/Perplexity; Shannon Game
HMM and Viterbi; Japanese Transliteration
8Mar 18Trigram Viterbi; Excel Demo
More on English and Japanese Phonology
Phonetics/Phonology 101: IPA, emic-etic
help on Ex1.
HW2 out: Shannon Game, English Pronunciation, and Katakana transliteration
9Mar 25Spring Break

HW2 due on Friday 4/5

10Apr 1
Unit 2: Trees and Grammars
11Apr 8CFGs Jie: discussions on HW2
Proposal suggestions out
12Apr 15PCFGs and CKY
Bottom-up vs. Top-down dynamic programming with memoization
Hypergraphs: generalized topological sort; Viterbi=>CKY; Dijkstra=>Knuth
13Apr 22Probabilistic Parsing with Unary Rules
Earley's Algorithm
proposal due
HW3 out: PCFG and CKY
Unit 3: Language Learning
14Apr 29Unsupervised Learning
EM (slow version).
EM slides

help on HW3.
15May 6Theory of EM convergence
EM (fast version: DP/forward-backward)
HW3 due. HW4 out: EM on Katakana transliteration.
16May 13
last week
project mid-way presentations HW4 due on Thursday (last day of instruction).
17May 20

Other NLP/CL courses:
Reading List

Liang Huang
Last modified: Fri Mar 15 18:03:42 EDT 2013