MCPlan: Monte-Carlo Planning Library

Overview and Motivation
Many control and planning problems occur in domains for which we do not have exact model specifications, which makes traditional model-based planning problematic. Often, however, one can construct or obtain a simulator for such domains that allow one to sample possible system transitions. Given such a simulator the question is how to best construct algorithms that can interact with the simulator in order to make good planning/control decisions in the real system. The research area of Monte-Carlo Planning involves studying such algorithms from both a theoretical and empirical level.
Monte-Carlo planning has seen a number of success stories in recent years, most notably in the field of computer Go, where there has been incredible progress that can be primarily attributed to a focus on Monte-Carlo Planning techniques. While these individual successes are impressive, there still remain many questions about the generality of various Monte-Carlo Planning algorithms, which is traditionally one of the goals of automated planning research in the Artificial Intelligence community. There have been few systematic studies that compare different algorithms across a variety of planning problems.
The primary purpose of the MCPlan library is to help facilitate such systematic comparisons. The Java-based MCPlan library provides a common API between Monte-Carlo planners and planning environments/problems. This makes it easy to implement a new planning problem or Monte-Carlo planner and immediately evaluate with a variety of other planners for comparison.
The primary developers of the library have so far been Paul Lewis, Shan Xue, and Erich Merrill III under the direction of Alan Fern.
MCPlan Library – the current Java source of the MCPlan library
MCPlan Manual– the minimal manual for the MCPlan library (will be more extensive with next version in Fall of 2013)
Tutorial Slides that Cover Algorithms in Library: Basics of Bandits,Monte-Carlo Policy Improvement,Monte-Carlo Tree Search
Library Contents
The library is continually expanding and we expect to a major new release during the Fall of 2013. The current library has a number of common planning algorithms and some benchmark planning domains as listed below.
Planning Algorithms
  • Random. A baseline agent that selects actions uniformly at random.
  • Uniform Policy Rollout. A policy rollout agent that evaluates each action at the root uniformly.
  • Epsilon-Greedy Policy Rollout. A policy rollout agent that uses epsilon-greedy exploration at the root to sample action values.
  • UCT. An implementation of the well-known UCT algorithm, which is an instance of Monte-Carlo Tree Search that uses the UCB rule to control exploration.
  • Sparse ExpectiMax. An implementation of the Sparse Sampling Monte-Carlo planner.
  • Galcon Fusion Agents. The library includes several agents specialized for the domain Galcon Fusion (see below). These can serve as examples and/or opponents in that domain.
  • Coming Soon: Ensemble UCT; Policy Switching; Mini-Max Policy Switching; Family of Monte-Carlo Tree Search variants; Monte-Carlo variants of Real-Time Dynamic Programming and AO*; SARSA and Q-Learning Variants
Most of the planning algorithms are able to utilize optional forms of user knowledge in the form of action filters (a function that rules out certain action choices), heuristic functions (a function that provides a value estimate of a state), and an informed base policy. These forms of knowledge are often important to achieving top performance in a domain, though they are not necessary for running the planners. More detailed documentation for these features will be available in Fall 2013.
Planning Domains
The set of planning domains is growing and a significant set of new domains will be available in Fall 2013. The current set of domains are listed below.
  • Backgammon. A popular 2-player game involving dice.
  • Yahtzee. A popular 1-player dice game.
  • Biniax. An implementation of a 1-player video game.
  • Connect 4. A popular deterministic 2-player connection game.
  • EWN (EinStein würfelt nicht). A 2-player German dice game.
  • Galcon Fusion. An implementation of a 2-player abstract real-time strategy game.
  • Havannah. A 2-player connection game.
  • RCW. A 1-player conservation game where the goal is to maximize the spread of birds.
This material is based upon work supported by the National Science Foundation under Grant Numbers 0958482 and IIS-0905678.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.