Taken from http://www.cs.ualberta.ca/~sutton/RL-FAQ.html#backpropagation I am doing RL with a backpropagation neural network and it doesn't work; what should I do? It is a common error to use a backpropagation neural network as the function approximator in one's first experiments with reinforcement learning, which almost always leads to an unsatisfying failure. The primary reason for the failure is that backpropation is fairly tricky to use effectively, doubly so in an online application like reinforcement learning. It is true that Tesauro used this approach in his strikingly successful backgammon application, but note that at the time of his work with TDgammon, Tesauro was already an expert in applying backprop networks to backgammon. He had already built the world's best computer player of backgammon using backprop networks. He had already learned all the tricks and tweaks and parameter settings to make backprop networks learn well. Unless you have a similarly extensive background of experience, you are likely to be very frustrated using a backprop network as your function approximator in reinforcement learning. The solution is to step back and accept you can only innovate in one area at a time. First make sure you understand RL ideas in the tabular case, and the general principles of function approximation in RL. Then proceed to a better understood function approximator, such as a linear one. In my own work I never go beyond linear function approximators. I just augment them with larger and larger feature vectors, using coarse tile-coding (see e.g., Chapter 8 of the book, Sutton, 1996; Stone and Sutton, 2001). It may be that this approach would always be superior to backpropagation nets, but that remains to be seen. But someone new to learning and RL algorithms certainly should not start with backprop nets. while you're at it: http://www.cs.ualberta.ca/~sutton/book/the-book.html