Kagan Tumer: <b>CLEAN Rewards for Improving Multiagent Coordination in the Presence of Exploration (extended abstract)</b>

Kagan Tumer's Publications

Display Publications by [Year] [Type] [Topic]

CLEAN Rewards for Improving Multiagent Coordination in the Presence of Exploration (extended abstract). C. HolmesParker, A. Agogino, and K. Tumer. In Proceedings of the Twelveth International Joint Conference on Autonomous Agents and Multiagent Systems, Minneapolis, MN, May 2013.

Abstract

In cooperative multiagent systems, coordinating the joint-actions of agents is difficult. One of the fundamental difficulties in such multiagent systems is the slow learning process where an agent may not only need to learn how to behave in a complex environment, but may also need to account for the actions of the other learning agents. Here, the inability of agents to distinguish the true environmental dynamics from those caused by the stochastic exploratory actions of other agents creates noise on each agent's reward signal. To address this, we introduce Coordinated Learning without Exploratory Action Noise (CLEAN) rewards, which are agent-specific shaped rewards that effectively remove such learning noise from each agent's reward signal. We demonstrate their performance with up to 1000 agents in a standard congestion problem.

Download

[PDF]193.6kB

BibTeX Entry

@inproceedings{tumer-holmesparker-clean_aamas13,
        author = {C. HolmesParker and A. Agogino and  K. Tumer},
        title = {{CLEAN} Rewards for Improving Multiagent Coordination in the Presence of Exploration (extended abstract)},
        booktitle = {Proceedings of the Twelveth International Joint Conference on Autonomous Agents and Multiagent Systems},
	month = {May},
	address = {Minneapolis, MN},
	abstract={In cooperative multiagent systems, coordinating the joint-actions of agents is difficult. One of the fundamental difficulties in such multiagent systems is the slow learning process where an agent may not only need to learn how to behave in a complex environment, but may also need to account for the actions of the other learning agents. Here, the inability of agents to distinguish the true environmental dynamics from those caused by the stochastic exploratory actions of other agents creates noise on each agent's reward signal.  To address this, we introduce Coordinated Learning without Exploratory Action Noise (CLEAN) rewards, which are agent-specific shaped rewards that effectively remove such learning noise from each agent's reward signal. We demonstrate their performance with up to 1000 agents in a standard congestion problem.},
	bib2html_pubtype = {Refereed Conference Papers},
	bib2html_rescat = {Multiagent Systems, Reinforcement Learning},
        year = {2013}
}

Generated by bib2html.pl (written by Patrick Riley ) on Wed Apr 01, 2020 17:39:43