Learning Multirobot Hose Transportation and Deployment by Distributed Round-Robin Q-Learning.

Multi-Agent Reinforcement Learning (MARL) algorithms face two main difficulties: the curse of dimensionality, and environment non-stationarity due to the independent learning processes carried out by the agents concurrently. In this paper we formalize and prove the convergence of a Distributed Round...

Full description

Bibliographic Details
Main Authors:	Borja Fernandez-Gauna, Ismael Etxeberria-Agiriano, Manuel Graña
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2015-01-01
Series:	PLoS ONE
Online Access:	http://europepmc.org/articles/PMC4497621?pdf=render

id	doaj-336f1c736d20458f9ee8715b19d51a03
record_format	Article
spelling	doaj-336f1c736d20458f9ee8715b19d51a032020-11-24T21:58:38ZengPublic Library of Science (PLoS)PLoS ONE1932-62032015-01-01107e012712910.1371/journal.pone.0127129Learning Multirobot Hose Transportation and Deployment by Distributed Round-Robin Q-Learning.Borja Fernandez-GaunaIsmael Etxeberria-AgirianoManuel GrañaMulti-Agent Reinforcement Learning (MARL) algorithms face two main difficulties: the curse of dimensionality, and environment non-stationarity due to the independent learning processes carried out by the agents concurrently. In this paper we formalize and prove the convergence of a Distributed Round Robin Q-learning (D-RR-QL) algorithm for cooperative systems. The computational complexity of this algorithm increases linearly with the number of agents. Moreover, it eliminates environment non sta tionarity by carrying a round-robin scheduling of the action selection and execution. That this learning scheme allows the implementation of Modular State-Action Vetoes (MSAV) in cooperative multi-agent systems, which speeds up learning convergence in over-constrained systems by vetoing state-action pairs which lead to undesired termination states (UTS) in the relevant state-action subspace. Each agent's local state-action value function learning is an independent process, including the MSAV policies. Coordination of locally optimal policies to obtain the global optimal joint policy is achieved by a greedy selection procedure using message passing. We show that D-RR-QL improves over state-of-the-art approaches, such as Distributed Q-Learning, Team Q-Learning and Coordinated Reinforcement Learning in a paradigmatic Linked Multi-Component Robotic System (L-MCRS) control problem: the hose transportation task. L-MCRS are over-constrained systems with many UTS induced by the interaction of the passive linking element and the active mobile robots.http://europepmc.org/articles/PMC4497621?pdf=render
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Borja Fernandez-Gauna Ismael Etxeberria-Agiriano Manuel Graña
spellingShingle	Borja Fernandez-Gauna Ismael Etxeberria-Agiriano Manuel Graña Learning Multirobot Hose Transportation and Deployment by Distributed Round-Robin Q-Learning. PLoS ONE
author_facet	Borja Fernandez-Gauna Ismael Etxeberria-Agiriano Manuel Graña
author_sort	Borja Fernandez-Gauna
title	Learning Multirobot Hose Transportation and Deployment by Distributed Round-Robin Q-Learning.
title_short	Learning Multirobot Hose Transportation and Deployment by Distributed Round-Robin Q-Learning.
title_full	Learning Multirobot Hose Transportation and Deployment by Distributed Round-Robin Q-Learning.
title_fullStr	Learning Multirobot Hose Transportation and Deployment by Distributed Round-Robin Q-Learning.
title_full_unstemmed	Learning Multirobot Hose Transportation and Deployment by Distributed Round-Robin Q-Learning.
title_sort	learning multirobot hose transportation and deployment by distributed round-robin q-learning.
publisher	Public Library of Science (PLoS)
series	PLoS ONE
issn	1932-6203
publishDate	2015-01-01
description	Multi-Agent Reinforcement Learning (MARL) algorithms face two main difficulties: the curse of dimensionality, and environment non-stationarity due to the independent learning processes carried out by the agents concurrently. In this paper we formalize and prove the convergence of a Distributed Round Robin Q-learning (D-RR-QL) algorithm for cooperative systems. The computational complexity of this algorithm increases linearly with the number of agents. Moreover, it eliminates environment non sta tionarity by carrying a round-robin scheduling of the action selection and execution. That this learning scheme allows the implementation of Modular State-Action Vetoes (MSAV) in cooperative multi-agent systems, which speeds up learning convergence in over-constrained systems by vetoing state-action pairs which lead to undesired termination states (UTS) in the relevant state-action subspace. Each agent's local state-action value function learning is an independent process, including the MSAV policies. Coordination of locally optimal policies to obtain the global optimal joint policy is achieved by a greedy selection procedure using message passing. We show that D-RR-QL improves over state-of-the-art approaches, such as Distributed Q-Learning, Team Q-Learning and Coordinated Reinforcement Learning in a paradigmatic Linked Multi-Component Robotic System (L-MCRS) control problem: the hose transportation task. L-MCRS are over-constrained systems with many UTS induced by the interaction of the passive linking element and the active mobile robots.
url	http://europepmc.org/articles/PMC4497621?pdf=render
work_keys_str_mv	AT borjafernandezgauna learningmultirobothosetransportationanddeploymentbydistributedroundrobinqlearning AT ismaeletxeberriaagiriano learningmultirobothosetransportationanddeploymentbydistributedroundrobinqlearning AT manuelgrana learningmultirobothosetransportationanddeploymentbydistributedroundrobinqlearning
_version_	1725851001451184128

Learning Multirobot Hose Transportation and Deployment by Distributed Round-Robin Q-Learning.

Similar Items