Kagan Tumer: <b>Dirichlet-Multinomial Counterfactual Rewards for Heterogeneous Multiagent Systems</b>

Kagan Tumer's Publications

Display Publications by [Year] [Type] [Topic]

Dirichlet-Multinomial Counterfactual Rewards for Heterogeneous Multiagent Systems. G. Dixit, N. Zerbel, and K. Tumer. In IEEE International Symposium on Multi-Robot and Multi-Agent Systems (MRS), pp. , Rutgers, NJ, August 2019.

Abstract

Multi-robot teams have been shown to be effective in accomplishing complex tasks which require tight coordination among team members. In homogeneous systems, recent work has demonstrated that Òstepping stoneÓ rewards are an effective way to provide agents with feedback on potentially valuable actions even when the agent-to-agent coupling requirements of an objective are not satisfied. In this work, we propose a new mechanism for inferring hypothetical partners in tightly- coupled, heterogeneous systems called Dirichlet-Multinomial Counterfactual Selection (DMCS). Using DMCS, we show that agents can learn to infer appropriate counterfactual partners to receive more informative stepping stone rewards by testing in a modified multi-rover exploration problem. We also show that DMCS outperforms a random partner selection baseline by over 40%, and we demonstrate how domain knowledge can be used to induce a prior to guide the agent learning process. Finally, we show that DMCS maintains superior performance for up to 15 distinct rover types compared to the performance of the baseline which degrades rapidly.

Download

[PDF]368.9kB

BibTeX Entry

@InProceedings{tumer-dixit_mrs19,
author = {G. Dixit and N. Zerbel and K. Tumer},
title = {Dirichlet-Multinomial Counterfactual Rewards for Heterogeneous Multiagent Systems},
booktitle = {IEEE International Symposium on Multi-Robot and Multi-Agent Systems (MRS)},
address = {Rutgers, NJ},
month = {August},
 pages={},
 abstract={Multi-robot teams have been shown to be effective in accomplishing complex tasks which require tight coordination among team members. In homogeneous systems, recent work has demonstrated that Òstepping stoneÓ rewards are an effective way to provide agents with feedback on potentially valuable actions even when the agent-to-agent coupling requirements of an objective are not satisfied. In this work, we propose a new mechanism for inferring hypothetical partners in tightly- coupled, heterogeneous systems called Dirichlet-Multinomial Counterfactual Selection (DMCS). Using DMCS, we show that agents can learn to infer appropriate counterfactual partners to receive more informative stepping stone rewards by testing in a modified multi-rover exploration problem. We also show that DMCS outperforms a random partner selection baseline by over 40%, and we demonstrate how domain knowledge can be used to induce a prior to guide the agent learning process. Finally, we show that DMCS maintains superior performance for up to 15 distinct rover types compared to the performance of the baseline which degrades rapidly.},
	bib2html_pubtype = {Refereed Conference Papers},
	bib2html_rescat = {Multiagent Systems},
year = {2019}
}

Generated by bib2html.pl (written by Patrick Riley ) on Wed Apr 01, 2020 17:39:43