Kagan Tumer: <b>CLEANing the Reward: Counterfactual Actions Remove Exploratory Action Noise in Multiagent Learning (Extended Abstract)</b>

Kagan Tumer's Publications

Display Publications by [Year] [Type] [Topic]

CLEANing the Reward: Counterfactual Actions Remove Exploratory Action Noise in Multiagent Learning (Extended Abstract). C. HolmesParker, M. Taylor, A. Agogino, and K. Tumer. In Proceedings of the Thirteenth International Joint Conference on Autonomous Agents and Multiagent Systems, pp. , Paris, France, May 2014.

Abstract

Coordinating the joint-actions of agents in cooperative multiagent systems is difficult. Learning in such multiagent systems can be slow because an agent may not only need to learn how to behave in a complex environment, but also to account for the actions of other learning agents. The inability of an agent to distinguish between the true environmental dynamics and those caused by the stochastic exploratory actions of other agents creates noise in each agentÕs reward signal. This learning noise can have unforeseen and often undesirable effects on the resultant system performance. We define such noise as exploratory action noise, demonstrate the critical impact it can have on the learning process in multiagent settings, and introduce a reward structure to effectively remove such noise from each agentÕs reward signal. In particular, we introduce two types of Coordinated Learning without Exploratory Action Noise (CLEAN) rewards that allow an agent to estimate the counterfactual reward it would have received had it taken an alternative action. We empirically show that CLEAN rewards outperform agents using both traditional global rewards and shaped difference rewards in two domains.

Download

[PDF]195.6kB

BibTeX Entry

@inproceedings{tumer-holmesparker_aamas14,
        author = {C. HolmesParker and M. Taylor and A. Agogino and  K. Tumer},
        title = {CLEANing the Reward: Counterfactual Actions Remove Exploratory Action Noise in Multiagent Learning (Extended Abstract)},
        booktitle = {Proceedings of the Thirteenth International Joint Conference on Autonomous Agents and Multiagent Systems},
	month = {May},
          pages ={},
	address = {Paris, France},
	abstract={Coordinating the joint-actions of agents in cooperative multiagent systems is difficult. Learning in such multiagent systems can be slow because an agent may not only need to learn how to behave in a complex environment, but also to account for the actions of other learning agents. The inability of an agent to distinguish between the true environmental dynamics and those caused by the stochastic exploratory actions of other agents creates noise in each agentÕs reward signal. This learning noise can have unforeseen and often undesirable effects on the resultant system performance. We define such noise as exploratory action noise, demonstrate the critical impact it can have on the learning process in multiagent settings, and introduce a reward structure to effectively remove such noise from each agentÕs reward signal. In particular, we introduce two types of Coordinated Learning without Exploratory Action Noise (CLEAN) rewards that allow an agent to estimate the counterfactual reward it would have received had it taken an alternative action. We empirically show that CLEAN rewards outperform agents using both traditional global rewards and shaped difference rewards in two domains.},
	bib2html_pubtype = {Refereed Conference Papers},
	bib2html_rescat = {Reinforcement Learning, Multiagent Systems},
        year = {2014}
}

Generated by bib2html.pl (written by Patrick Riley ) on Wed Apr 01, 2020 17:39:43