Kagan Tumer's Publications

Display Publications by [Year] [Type] [Topic]


Potential-Based Difference Rewards for Multiagent Reinforcement Learning. S. Devlin, L. Yliniemi, D. Kudenko, and K. Tumer. In Proceedings of the Thirteenth International Joint Conference on Autonomous Agents and Multiagent Systems, pp. , Paris, France, May 2014.

Abstract

Difference rewards and potential-based reward shaping can both significantly improve the joint policy learnt by multiple reinforcement learning agents acting simultaneously in the same environment. Difference rewards capture an agent's contribution to the system's performance. Potential-based reward shaping has been proven to not alter the Nash equilibria of the system but requires domain-specific knowledge. This paper introduces two novel reward functions that combine these methods to leverage the benefits of both.

Using the difference reward's Counterfactual as Potential (CaP) allows the application of potential-based reward shaping to a wide range of multiagent systems without the need for domain specific knowledge whilst still maintaining the theoretical guarantee of consistent Nash equilibria. Alternatively, Difference Rewards incorporating Potential-Based Reward Shaping (DRiP ) uses potential-based reward shaping to further shape difference rewards. By exploiting prior knowledge of a problem domain, this paper demonstrates agents using this approach can converge either up to 23.8 times faster than or to joint policies up to 196% better than agents using difference rewards alone.

Download

[PDF]669.2kB  

BibTeX Entry

@inproceedings{tumer-devlin_aamas14,
        author = {S. Devlin and L. Yliniemi and D. Kudenko and K. Tumer},
        title = {Potential-Based Difference Rewards for Multiagent Reinforcement Learning},
        booktitle = {Proceedings of the Thirteenth International Joint Conference on Autonomous Agents and Multiagent Systems},
	month = {May},
          pages ={},
	address = {Paris, France},
	abstract={Difference rewards and potential-based reward shaping can both significantly improve the joint policy learnt by multiple reinforcement learning agents acting simultaneously in the same environment. Difference rewards capture an agent's contribution to the system's performance. Potential-based reward shaping has been proven to not alter the Nash equilibria of the system but requires domain-specific knowledge. This paper introduces two novel reward functions that combine these methods to leverage the benefits of both.
	<p>
Using the difference reward's Counterfactual as Potential (CaP) allows the application of potential-based reward shaping to a wide range of multiagent systems without the need for domain specific knowledge whilst still maintaining the theoretical guarantee of consistent Nash equilibria. Alternatively, Difference Rewards incorporating Potential-Based Reward Shaping (DRiP ) uses potential-based reward shaping to further shape difference rewards. By exploiting prior knowledge of a problem domain, this paper demonstrates agents using this approach can converge either up to 23.8 times faster than or to joint policies up to 196% better than agents using difference rewards alone.},
	bib2html_pubtype = {Refereed Conference Papers},
	bib2html_rescat = {Reinforcement Learning, Multiagent Systems},
        year = {2014}
}

Generated by bib2html.pl (written by Patrick Riley ) on Tue Jun 26, 2018 19:10:42