Kagan Tumer's Publications

Display Publications by [Year] [Type] [Topic]


Multiagent Learning with a Noisy Global Reward Signal. S. Proper and K. Tumer. In Proceedings of the Twenty Seventh AAAI Conference on Artificial Intelligence (AAAI-13), Bellevue, WA, July 2013.

Abstract

Scaling multiagent reinforcement learning to domains with many agents is a complex problem. In particular, multiagent credit assignment becomes a key issue as the system size increases. Some multiagent systems suffer from a global reward signal that is very noisyor difficult to analyze. This makes deriving a learnable local reward signal very difficult.Difference rewards (a particular instance of reward shaping) have been used to alleviate this concern, but they remain difficult to compute in many domains. In this paper we present an approach to modeling the global reward using function approximation that allows the quick computation of local rewards. We demonstrate how this model can result in significant improvements in behavior for three congestion problems: a multiagent ``bar problem'', a complex simulation of the United States airspace, and a generic air traffic domain. We show how the model of the global reward may be either learned on- or off-line using either linear functions or neural networks. For the bar problem, we show an increase in reward of nearly 200\% over learning using the global reward directly. For the air traffic problem, we show a decrease in costs of 25\% over learning using the global reward directly.

Download

[PDF]172.5kB  

BibTeX Entry

@inproceedings{tumer-proper_aaai13,
        author = {S. Proper and  K. Tumer},
        title = {Multiagent Learning with a Noisy Global Reward Signal},
        booktitle = {Proceedings of the Twenty Seventh AAAI Conference on Artificial Intelligence (AAAI-13)},
	month = {July},
	address = {Bellevue, WA},
	abstract={Scaling multiagent reinforcement learning to domains with many agents is a complex problem. In particular, multiagent credit assignment becomes a key issue as the system size increases. Some multiagent systems suffer from a global reward signal that is very noisy
or difficult to analyze. This makes deriving a learnable local reward signal very difficult.
Difference rewards (a particular instance of reward shaping) have been used to alleviate this concern, but they remain difficult to compute in many domains. In this paper we present an approach to modeling the global reward using function approximation that allows the quick computation of local rewards. We demonstrate how this model can result in significant improvements in behavior for three congestion problems: a multiagent ``bar problem'', a complex simulation of the United States airspace, and a generic air traffic domain. We show how the model of the global reward may be either learned on- or off-line using either linear functions or neural networks. For the bar problem, we show an increase in reward of nearly 200\% over learning using the global reward directly. For the air traffic problem, we show a decrease in costs of 25\% over learning using the global reward directly. },
	bib2html_pubtype = {Refereed Conference Papers},
	bib2html_rescat = {Multiagent Systems, Reinforcement Learning},
        year = {2013}
}

Generated by bib2html.pl (written by Patrick Riley ) on Wed Apr 01, 2020 17:39:43