CS 519-001, Natural Language Processing, Spring 2017

Audience PhD students in AI (and NLP in particular); MS students in AI who want to continue to PhD
Coordinates TR, 4-5:20pm, BEXL 207 [Registrar] [Canvas]
Instructor Liang Huang
TAs Juneki Hong
Office hours T 5:20-5:50pm and Th 3:30-3:55pm, KEC 2069 (Liang).
M/F 2-3pm KEC Atrium (Juneki).
Prerequisites
  • required: algorithms: CS 325/519/515.
    a solid understanding of dynamic programming is extremely important.
  • required: proficiency in at least one mainstream programming languages (Python, C/C++, Java).
    HWs will be done in Python only. You can learn Python from these slides in 2 hours.
  • recommended: automata and formal language theory: CS 321/516.
  • recommended: machine learning: CS 534.
Textbooks
(optional)
  • Jurafsky and Martin, 2009 (2nd edi.), Speech and Language Processing. (default)
  • Manning and Schutze. 1999. Foundations of Statistical Natural Language Processing.
Grading
(tentative)
  • HWs: programming homework: 12% x 5=60%.
  • EXs: simple exercises: 3% x 3=9%.
  • paper review: 12%.
  • quizzes: 7+7=14%.
  • class participation: 5%.
  • no exams, no project.
Other Policies
  • this course can be used to fulfill the AI area requirement.
  • no late submission will be accepted (since you work in teams).
  • class participation: reward for helping others on Canvas, reporting bugs, etc.
Previous Offerings
MOOCs
(coursera)
  • Jurafsky and Manning (Stanford)
  • Collins (Columbia) -- more mathematical
Objectives This course provides an introduction to natural language processing, the study of human language from a computational perspective. We will cover finite-state machines (weighted FSAs and FSTs), syntactic structures (weighted context-free grammars and parsing algorithms), and machine learning methods (maximum likelihood and expectation-maximization). The focus will be on (a) modern quantitative techniques in NLP that use large corpora and statistical learning, and (b) various dynamic programming algorithms (Viterbi, CKY, Forward-Backward, and Inside-Outside). At the end of this course, students should have a good understanding of the research questions and methods used in different areas of natural language processing. Students should also be able to use this knowledge to implement simple natural language processing algorithms and applications. Students should also be able to understand and evaluate original research papers in natural language processing that build on and go beyond the textbook material covered in class.

Topics/Slides

Exercises

EXs usually prepare you for HWs and quizzes.

Programming Assignments

HWs are generally due every other Monday at midnight.
They involve Python implementations of various dynamic programming algorithms such as Viterbi, Forward-Backward, and CKY, as well as machine learning algorithms such as MLE and EM.

Paper Review

Guidelines and Paper List

Background on Japanese

As you can see from the course materials, unlike most NLP courses, this class (following Kevin Knight's tradition) makes very heavy use of the Japanese language as a running example to demonstrate the linguistic diversity, to illustrate transliteration and translation, and to teach the Viterbi and EM algorithms. While we do not require students to have any prior knowledge of Japanese, it is helpful to be familiar with various linguistic aspects of the language, especially in phonology. Here is a great video on the linguistic background of Japanese.