Kagan Tumer's Publications

Display Publications by [Year] [Type] [Topic]


Counterfactual Exploration for Improving Multiagent Learning. M. Colby, S. Kharaghani, C. HolmesParker, and K. Tumer. In Proceedings of the Fourteenth International Joint Conference on Autonomous Agents and Multiagent Systems, pp. , Istanbul, Turkey, May 2015.

Abstract

In any single agent system, exploration is a critical component of learning. It ensures that all possible actions receive some degree of attention, allowing an agent to converge to good policies. The same concept has been adopted by multiagent learning systems. However, there is a fundamentally different dynamic at play in multiagent learning: each agent operates in a non-stationary environment, as a direct result of the evolving policies of other agents in the system. As such, exploratory actions taken by agents bias the policies of other agents, forcing them to perform optimally in the presence of agent exploration. CLEAN rewards address this issue by privatizing exploration (agents take their best action, but internally compute rewards for counterfactual actions). However, CLEAN rewards require each agent to know the mathematical form of the system evaluation function, which is typically unavailable to agents. In this paper, we present an algorithm to approximate CLEAN rewards, eliminating exploratory action noise without the need for expert system knowledge. Results in both coordination and congestion domains demonstrate the approximated CLEAN rewards obtain up to $95$\% of the performance of directly computed CLEAN rewards, without the need for expert domain knowledge while utilizing $99$\% less information about the system.

Download

[PDF]319.0kB  

BibTeX Entry

@inproceedings{tumer-colby_cleanapprox-aamas15,
author = {M. Colby and S. Kharaghani and C. HolmesParker and K. Tumer},
title = {Counterfactual Exploration for Improving Multiagent Learning},
booktitle = {Proceedings of the Fourteenth International Joint Conference on Autonomous Agents and Multiagent Systems},
month = {May},
pages ={},
address = {Istanbul, Turkey},
abstract={In any single agent system, exploration is a critical component of learning. It ensures that all possible actions receive some degree of attention, allowing an agent to converge to good policies. The same concept has been adopted by multiagent learning systems. However, there is a fundamentally different  dynamic at play in multiagent learning: each agent operates in a non-stationary environment, as a direct result of the evolving policies of other agents in the system. As such, exploratory actions taken by agents  bias the policies of other agents, forcing them to perform optimally in the presence of agent exploration.  CLEAN rewards address this issue by privatizing exploration (agents take their best action, but internally compute rewards for counterfactual actions).  However, CLEAN rewards require each agent to know the mathematical form of the system evaluation function, which is typically unavailable to agents.  In this paper, we present an algorithm to approximate CLEAN rewards, eliminating exploratory action noise without the need for expert system knowledge.  Results in both coordination and congestion domains demonstrate the approximated CLEAN rewards obtain up to $95$\% of the performance of directly computed CLEAN rewards, without the need for expert domain knowledge while utilizing $99$\% less information about the system.},
	bib2html_pubtype = {Refereed Conference Papers},
	bib2html_rescat = {Multiagent Systems},year = {2015}
}

Generated by bib2html.pl (written by Patrick Riley ) on Tue Jun 26, 2018 19:10:42