Programming Assignment 1: Due January 19, 2000

Update: To use the priority queue code, you need the files tree.h, tree.c, queue.h and stack.h. Sorry about that!

In this assignment, you will implement and experiment with value iteration and prioritized sweeping for the "Jack's Car Rental" problem (p. 99 of Sutton and Barto). We will modify the problem statement slightly so that it is easier to complete.

This problem can be defined as follows:

States: (n[1], n[2]), where 0 <= n[1], n[2] <= 10, n[1] is the number of cars at location 1, and n[2] is the number of cars at location 2 at the end of the day.
Actions: transfer(x), where -3 <= x <= +3 is the number of cars transferred from location 1 to location 2. If x is negative, then -x cars are moved from location 2 to location 1. A transfer is not legal if it would leave fewer than zero or more than 10 cars at one of the locations.
Dynamics: At each time t (i.e., each day), and at each location l, we generate two random numbers: out (the number of cars requested to be rented) and in (the number of cars returned). These numbers are drawn according to Poisson distributions as follows:
- out[1] has a mean of 3, out[2] has a mean of 4
- in[1] has a mean of 3, and in[2] has a mean of 2.
Note that this means demand is higher at location 2, but the supply is higher at location 1, so in general, the optimal policy will need to move cars from location 1 to location 2.
The Poisson distribution generates a value of u with probability (lambda^u / u!) * exp(-u), where lambda is the mean parameter of the distribution, as given above.
Let n[L] be the number of cars at location L at the beginning of the day, let x be the number of cars moved to this location, let in[L] be the number of cars returned, and out[L] be the number of cars requested. Then the number of cars rented is rented[L] = min(n[L] + x, out[L]), because cars returned on one day are not available for rental until the next day. The number of cars remaining at the end of the day is min(10, n[L] + x + in[L] - rented[L]).
Reward Function: The reward is -2 * |x| + 10 * (rented[1] + rented[2]). The discount factor is 0.9.

I have implemented this domain in C++, and I have also implemented Value Iteration. In the assignment, your task is to implement prioritized sweeping and compare its performance with Value Iteration.

The code is organized in four files:

mdp.h defines an abstract class Problem, which encapsulates all of the domain-specific aspects of a problem. It also defines the class MDP, which implements Value Iteration and various helper functions. Your job is to modify the MDP class to implement Prioritized Sweeping.
jacks.h, jacks.cc. These files define the car rental domain by subclassing Problem. Notice that most of the work in this code is involved with constructing the probability transition model. The CPU time required to compute the model is about the same as the time required to perform value iteration!
jacksvi.cc. This is the main program. It just creates an instance of an JacksProblem and then an instance of MDP, and invokes MDP::ValueIteration. Finally, it prints out the resulting policy.

Your code should implement prioritized sweeping. As with my Value Iteration code, it should count the number of primitive Q backups and display this number when it terminates.

All of the code that you need for this problem is available in the following tar file. It has been tested under Solaris and on my laptop. I am using modified versions of Tim Budd's library routines for data structures, rather than STL (sorry!). If anyone wants to port this program to STL, that would be great!

Please turn in a hardcopy listing of your program and a comparison of prioritized sweeping and value iteration on this program.