Kagan Tumer: <b>Multiagent Learning for Black Box System Reward Functions</b>

Kagan Tumer's Publications

Display Publications by [Year] [Type] [Topic]

Multiagent Learning for Black Box System Reward Functions. K. Tumer and A. K. Agogino. Advances in Complex Systems, 12(4-5):475–492, 2009.

Abstract

In large, distributed systems composed of adaptive and interactive components (agents), ensuring the coordination among the agents so that the system achieves certain performance objectives is a challenging proposition. The key difficulty to overcome in such systems is one of credit assignment: How to apportion credit (or blame) to a particular agent based on the performance of the entire system. In this paper, we show how this problem can be solved in general for a large class of reward functions whose analytical form may be unknown (hence "black box" reward). This method combines the salient features of global solutions (e.g., "team games'') which are broadly applicable but provide poor solutions in large problems, with local, but aligned solutions (e.g., ``difference rewards'') which learn quickly, but can be computationally burdensome. We introduce two estimates for the difference reward for a class of problems where the mapping from the agent actions to system reward functions can be decomposed into a linear combination of nonlinear functions of the agents' actions. We test our method's performance on a distributed marketing problem and an air traffic flow management problem and show a 44\% performance improvement over team games and a speedup of order n for difference rewards (for an n agent system).

Download

[PDF]350.7kB

BibTeX Entry

@article{tumer-agogino_blackbox_acs09,
	author = {K. Tumer and A. K. Agogino},
	title = {Multiagent Learning for Black Box System Reward Functions},
	journal = {Advances in Complex Systems},
	Volume = {12},
	Number = {4-5},
	Pages = {475-492},
	bib2html_pubtype = {Journal Articles},
	bib2html_rescat = {Multiagent Systems},
	abstract ={In large, distributed systems composed of adaptive and interactive components (agents), ensuring the coordination among the agents so that the system achieves certain performance objectives is a challenging proposition. 
The key difficulty to overcome in such systems is one of credit assignment: How to apportion credit (or blame) to a particular agent based on the performance of the entire system. In this paper, we show how this problem can be solved in general for a large class of reward functions whose analytical form may be unknown (hence "black box" reward). This method combines the salient features of global solutions (e.g., "team games'') which are broadly applicable but provide poor solutions in large problems, with local, but aligned solutions (e.g.,  ``difference rewards'') which learn quickly, but can be computationally burdensome. We introduce two estimates for the difference reward for  a class of problems where the mapping from the agent actions to system reward functions can be decomposed into a linear combination of nonlinear functions of the agents' actions.  We test our method's performance on a distributed marketing problem and an air traffic flow management problem and show a 44\%  performance improvement over team games and a speedup of order <em>n</em> for difference rewards (for an <em>n</em> agent system).},
	year = {2009}
}

Generated by bib2html.pl (written by Patrick Riley ) on Wed Apr 01, 2020 17:39:43