Kagan Tumer: <b>Multiagent Learning of Choices via Simpler MDPs and Reward Shaping</b>

Kagan Tumer's Publications

Display Publications by [Year] [Type] [Topic]

Multiagent Learning of Choices via Simpler MDPs and Reward Shaping. A. Iscen and K. Tumer. In AAMAS-2012 Workshop on Adaptive and Learning Agents, Valencia, Spain, June 2012.

Abstract

In many multiagent learning problems, the complexity can be decreased by exploiting the repetitive nature of the problem. For this reason, the literature of Reinforcement Learning (RL) contains many different hierarchical RL methods that decompose a given task into subtasks that need to be accomplished for a defined goal. On the other hand, some single agent and multiagent problems require to achieve only one of many similar goals. These problems force the agent to choose one subtask instead of accomplishing all subtasks. To exploit this type of problems with choices of repetitive structure, we introduce a method that uses High Level Evaluation of Low Level MDPs (HELM) where a low level MDP is used for each of the subtask choices of the original MDP. Inspired by hierarchical schema, this simpler but parameterized MDP is used both to learn to achieve a subtask and to evaluate different subtask options for the agent. To be able to use the schema in a multiagent setting, we extend the method using a subtask based version of difference rewards, a reward shaping method proven to work for cooperative multiagent systems. The algorithm is tested using the multi-rover problem where rovers cooperatively learn to observe POIs in the environment. The results show that agents which use the combination of HELM and subtask based difference rewards result in significant improvement both on learning speed, and converged policies.

Download

(unavailable)

BibTeX Entry

@incollection{tumer-iscen_ala12,
        author = {A. Iscen  and K. Tumer},
        title = {Multiagent Learning of Choices via Simpler MDPs and Reward Shaping},
        booktitle = {AAMAS-2012 Workshop on Adaptive and Learning Agents},
	month = {June},
	address = {Valencia, Spain},
	editors = {E. Howley and P. Vrancx and M. Knudson},
	abstract={In many multiagent learning problems, the complexity can be decreased by exploiting the repetitive nature of the problem. For this reason, the literature of Reinforcement Learning (RL) contains many different hierarchical RL methods that decompose a given task into subtasks that need to be accomplished for a defined goal. On the other hand, some single agent and multiagent problems require to achieve only one of many similar goals. These problems force the agent to choose one subtask instead of accomplishing all subtasks. To exploit this type of problems with choices of repetitive structure, we introduce a method that uses High Level Evaluation of Low Level MDPs (HELM) where a low level MDP is used for each of the subtask choices of the original MDP. Inspired by hierarchical schema, this simpler but parameterized MDP is used both to learn to achieve a subtask and to evaluate different subtask options for the agent. To be able to use the schema in a multiagent setting, we extend the method using a subtask based version of difference rewards, a reward shaping method proven to work for cooperative multiagent systems. The algorithm is tested using the multi-rover problem where rovers cooperatively learn to observe POIs in the environment. The results show that agents which use the combination of HELM and subtask based difference rewards result in significant improvement both on learning speed, and converged policies.},
	bib2html_pubtype = {Workshop/Symposium Papers},
	bib2html_rescat = {Multiagent Systems, Reinforcement Learning},
        year = {2012}
}

Generated by bib2html.pl (written by Patrick Riley ) on Wed Apr 01, 2020 17:39:43