stack.h. Sorry about that!
In this assignment, you will implement and experiment with value iteration and prioritized sweeping for the "Jack's Car Rental" problem (p. 99 of Sutton and Barto). We will modify the problem statement slightly so that it is easier to complete.
This problem can be defined as follows:
The Poisson distribution generates a value of u with probability (lambdau / u!) * exp(-u), where lambda is the mean parameter of the distribution, as given above.
Let n[L] be the number of cars at location L at the beginning of the day, let x be the number of cars moved to this location, let in[L] be the number of cars returned, and out[L] be the number of cars requested. Then the number of cars rented is rented[L] = min(n[L] + x, out[L]), because cars returned on one day are not available for rental until the next day. The number of cars remaining at the end of the day is min(10, n[L] + x + in[L] - rented[L]).
The code is organized in four files:
mdp.hdefines an abstract class
Problem, which encapsulates all of the domain-specific aspects of a problem. It also defines the class
MDP, which implements Value Iteration and various helper functions. Your job is to modify the
MDPclass to implement Prioritized Sweeping.
jacks.h, jacks.cc. These files define the car rental domain by subclassing
Problem. Notice that most of the work in this code is involved with constructing the probability transition model. The CPU time required to compute the model is about the same as the time required to perform value iteration!
jacksvi.cc. This is the main program. It just creates an instance of an
JacksProblemand then an instance of
MDP, and invokes
MDP::ValueIteration. Finally, it prints out the resulting policy.
Your code should implement prioritized sweeping. As with my Value Iteration code, it should count the number of primitive Q backups and display this number when it terminates.
All of the code that you need for this problem is available in the following tar file. It has been tested under Solaris and on my laptop. I am using modified versions of Tim Budd's library routines for data structures, rather than STL (sorry!). If anyone wants to port this program to STL, that would be great!
Please turn in a hardcopy listing of your program and a comparison of prioritized sweeping and value iteration on this program.