Kagan Tumer's Publications

Display Publications by [Year] [Type] [Topic]

Abstract

The ability to analyze the effectiveness of agent reward structures is critical to the successful design of multiagent learning algorithms. Though final system performance is the best indicator of the suitability of a given reward structure, it is often preferable to analyze the reward properties that lead to good system behavior (i.e., properties promoting coordination among the agents and providing agents with strong signal to noise ratios). This step is particularly helpful in continuous, dynamic, stochastic domains ill-suited to simple table backup schemes commonly used in TD(\lambda)/Q-learning where the effectiveness of the reward structure is difficult to distinguish from the effectiveness of the chosen learning algorithm.In this paper, we present a new reward evaluation method that provides a visualization of the tradeoff between the level of coordination among the agents and the difficulty of the learning problem each agent faces. This method is independent of the learning algorithm and is only a function of the problem domain and the agents' reward structure. We use this reward property visualization method to determine an effective reward without performing extensive simulations. We then test this method in both a static and a dynamic multi-rover learning domain where the agents have continuous state spaces and take noisy actions (e.g., the agents' movement decisions are not always carried out properly). Our results show that in the more difficult dynamic domain, the reward efficiency visualization method provides a two order of magnitude speedup in selecting good rewards, compared to running a full simulation. In addition, this method facilitates the design and analysis of new rewards tailored to the observational limitations of the domain, providing rewards that combine the best properties of traditional rewards.

BibTeX Entry

@article{tumer-agogino_jaamas08,
author = {A. K. Agogino and K. Tumer},
title = {Analyzing and Visualizing Multiagent Rewards in Dynamic and Stochastic Environments},
journal = {Journal of Autonomous Agents and Multi-Agent Systems},
volume = {17},
number = {2},
pages = {320-338},
bib2html_pubtype = {Journal Articles},
bib2html_rescat = {Reinforcement Learning, Multiagent Systems},
abstract ={
The ability to analyze the effectiveness of agent reward structures is critical to the successful design of  multiagent learning algorithms. Though final system performance is the best indicator of the suitability of a given reward structure, it is often preferable to analyze the reward properties that lead to  good system behavior (i.e., properties promoting coordination among the agents and providing agents with strong signal to noise ratios). This step is particularly helpful in continuous, dynamic,  stochastic domains ill-suited to simple table backup schemes commonly used in TD(\lambda)/Q-learning where the effectiveness of the reward structure is difficult to distinguish from the effectiveness of the chosen learning algorithm.
In this paper, we  present a new reward evaluation method that provides a visualization of the tradeoff between the level of coordination among the agents and the difficulty of the learning problem each agent faces. This method is independent of the learning algorithm and is only a function of the problem domain and the agents' reward structure. We use this reward property visualization method to determine an effective reward without performing extensive simulations. We then test this method in both a static and a dynamic multi-rover learning domain where the agents have continuous state spaces and take noisy actions (e.g., the agents' movement decisions are not always carried out properly). Our results show that in the more difficult dynamic domain, the reward efficiency visualization method provides a two order of magnitude speedup in selecting good rewards,  compared to running a full simulation.  In addition, this method facilitates the design and analysis of  new rewards tailored to the observational limitations of the domain, providing rewards that combine the best properties of traditional rewards.},
year = {2008}
}


Generated by bib2html.pl (written by Patrick Riley ) on Tue Jun 26, 2018 19:10:42