CS539: Probabilistic Agents: Syllabus

Personnel
- Instructor: Tom Dietterich, Dearborn 306, 737-5559, tgd@cs.orst.edu
  Office Hours: Monday, Friday 1:00-2:00pm.
Meeting Times
MWF 9:00-9:50 Rogers 332
Textbooks
Learning in Graphical Models, Michael Jordan (Ed.). MIT Press. ISBN 0-262-60032-3.
Reinforcement Learning : An Introduction, Richard S. Sutton, Andrew G. Barto, MIT PRess. ISBN 0-22-19398-1.
You may also find the textbook from CS430/530 useful: Artificial Intelligence: A Modern Approach, Stuart Russell and Peter Norvig.
Goals
- Understand the syntax, semantics, computational properties, and modeling methods of belief networks and influence diagrams.
- Master the algorithms for belief propagation in directed belief networks, especially the junction tree algorithm.
- Understand and implement special cases of the junction tree algorithm, including the forward-backward algorithm for HMMs.
- Understand and implement algorithms for learning in belief networks, especially the EM algorithm for learning with hidden variables.
- Apply EM to perform unsupervised clustering using the Naive Bayes model.
- Combine EM with the forward-backward algorithm to learn the parameters of HMMs.
- Understand and implement the prioritized sweeping algorithm for incremental value iteration.
- Understand and implement the SARSA(lambda) algorithm for model-free on-policy learning algorithm.
- Understand and implement the Q learning algorithm for off-policy learning.
- Construct a probabilistic agent for a partially-observable environment by learning an HMM, performing prioritized sweeping, and choosing actions using 1-step value-of-information.
Prerequisites
CS430/530.
Grading
- Homework and Programming Assignments 60%
- Midterm Exam 20%
- Final Exam 20%
Written Homework and Programs are due at the beginning of class.
Each student is responsible for his/her own work. The standard departmental rules for academic dishonesty apply to all assignments in this course. Collaboration on homeworks and programs should be limited to answering questions that can be asked and answered without using any written medium (e.g., no pencils, pens, or email). This means that no student should read any code written by another student.
Turning In Programming Assignments
You will turn in your solutions to programming problems via email as well as providing a hard copy. Please email your solutions to tgd@cs.orst.edu.

COURSE SCHEDULE
Here is the plan for the course. There will probably be some slippage, so I have left three extra days at the end.

Jan  5 Agents.  Markov decision problems.  Partially-observable Markov
       Decision Problems. 
     7 Optimal value functions and policies [SB 3]
    10 Policy Evaluation, Policy Iteration. [SB 4]
    12 Value Iteration, Generalized Value Iteration, Prioritized
       Sweeping [SB 4 and handout]
    14 Monte Carlo methods [SB 5]
    17 MLK Holiday:  No Class
    19 TD(0), SARSA(0), Q learning.  [SB 6.4]
    21 Average Reward DP: R learning [handout]
    24 TD(lambda) [SB 7]
    26 SARSA(lambda), Q(lambda) [SB 7]
    28 TD(lambda) with function approximation [SB 8]

    31 MIDTERM EXAM
       
Feb  2 Model-based learning. Compact models of the environment
     4 Review of belief networks 
     7 Belief net inference: SPI
     9 Belief net inference: Junction Tree algorithm
    11 Constructing junction trees using SPI
    14 Learning in belief nets: fully observable, known structure
    16 Learning in belief nets: hidden variables, known structure
       The Hard EM algorithm for Gaussian mixtures, 
    18 EM for naive Bayes mixture models
    21 EM and overfitting, Dirichlet priors
    23 Gibbs Sampling with and without learning
    25 Hidden Markov Models: Forward algorithm, MPE
    28 Hidden Markov Models: Viterbi Algorithm
Mar  1 HMMs applied to speech recognition and DNA sequence modeling
     3 Factorial HMMs and Monte Carlo inference in HMMs              
     6 HMMs applied to reinforcement learning of POMDPs.             
       Greedy action selection in HMMs: Value of Information         
     8 Decomposition methods for MDPs:  MAXQ 
    10 Class Cancelled

    16  14:00 FINAL EXAM