CS 533 - Intelligent Agents and Decision Making

CS 533 - Intelligent Agents and Decision Making
Spring 2016

Overview

In this course we will study models and algorithms for automated planning and decision making. The course will be divided into four main sections. First, we will study planning in the context of Markov decision processes (MDPs) where the environment is allowed to be stochastic. We will cover the basic theory and algorithms for explicit state-space MDPs for exactly solving small to moderately sized problems. Second, we will study the basic theory and algorithms for reinforcement learning, where the agent is not given a model of the environment, but instead must learn to act in the world by directly interacting with the environment. We will learn about a number of algorithms and paradigms including temporal-difference learning, policy gradient methods, and least-squares methods. Third, we will study the area of Monte-Carlo planning, which is a middle ground between reinforcement learning and MDP planning, where a simulator of the system to be controlled is available and can be used to make intelligent action choices. Finally we will study approaches for solving enormous MDPs that are expressed via compact representations. In particular, we will learn about symbolic dynamic programming for symbolically solving MDPs and also study algorithms that are specialized to solving deterministic planning problems, which offer perhaps the best scalability when they are applicable to a problem.

Learning Objectives of the Course:

1. Understand the basic theory and definitions of Markov decision processes (MDPs).

2. Understand the basic algorithms for solving explicit state-space MDPs: value iteration and policy iteration

3. Understand basic reinforcement learning algorithms: temporal-different learning, policy gradient methods, least-squares policy iteration

4. Understand basic Monte-Carlo planning algorithms for Markov decision processes: policy rollout, sparse sampling, Monte-Carlo tree search, approximate policy iteration

5. Understand the primary paradigms and algorithms for planning with factored representations of enormous MDPs.

Assignments:

There will be a number of assignments. Each will generally involve implementing and evaluating one or more algorithms and reporting there results. You are free to use any programming language that you would like to complete the assignments.

Final Project:

Students will work on a final project during the last month of the course. Small teams are allowed. The topic of the final project is up to the students, but must be relevant to the course content and approved by the instructor. Each team will present their final project to the class during the week of final exams.

Grades

The final grade will be calculated as follows: Assignments 75%, Final Project 25%

Honor Code

Collaboration on assignments is not permitted. The instructor will actively check for copying of code and solutions. The work you hand in should be your own. Any violation of these rules will result in failing the course.