A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative Agents

Multi-agent reinforcement learning (MARL) can be used to design intelligent agents for solving cooperative tasks. Within the MARL category, this paper proposes the probability of maximal reward based on the infinitesimal gradient ascent (PMR-IGA) algorithm to reach the maximal total reward in repeat...

Full description

Bibliographic Details
Main Authors:	Zhen Zhang, Dongqing Wang, Dongbin Zhao, Qiaoni Han, Tingting Song
Format:	Article
Language:	English
Published:	IEEE 2018-01-01
Series:	IEEE Access
Subjects:	Multi-agent reinforcement learning gradient ascent Q-learning cooperative tasks
Online Access:	https://ieeexplore.ieee.org/document/8517104/

id	doaj-8eaebd97e1574567be7a7f56af0df567
record_format	Article
spelling	doaj-8eaebd97e1574567be7a7f56af0df5672021-03-29T21:34:22ZengIEEEIEEE Access2169-35362018-01-016702237023510.1109/ACCESS.2018.28788538517104A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative AgentsZhen Zhang0https://orcid.org/0000-0002-6615-629XDongqing Wang1Dongbin Zhao2Qiaoni Han3Tingting Song4School of Automation, Qingdao University, Qingdao, ChinaSchool of Automation, Qingdao University, Qingdao, ChinaState Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, ChinaSchool of Automation, Qingdao University, Qingdao, ChinaSchool of Automation, Qingdao University, Qingdao, ChinaMulti-agent reinforcement learning (MARL) can be used to design intelligent agents for solving cooperative tasks. Within the MARL category, this paper proposes the probability of maximal reward based on the infinitesimal gradient ascent (PMR-IGA) algorithm to reach the maximal total reward in repeated games. Theoretical analyses show that in a finite-player-finite-action repeated game with two pure optimal joint actions where no common component action exists, both the optimal joint actions are stable critical points of the PMR-IGA model. Furthermore, we apply the Q-value function to estimate the gradient and derive the probability of maximal reward based on estimated gradient ascent (PMR-EGA) algorithm. Theoretical analyses and simulations of case studies of repeated games show that the maximal total reward can be achieved under any initial conditions. The PMR-EGA can be naturally extended to optimize cooperative stochastic games. Two stochastic games, i.e., box pushing and a distributed sensor network, are used as test beds. The simulations show that the PMR-EGA displays consistently an excellent performance for both stochastic games.https://ieeexplore.ieee.org/document/8517104/Multi-agent reinforcement learninggradient ascentQ-learningcooperative tasks
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Zhen Zhang Dongqing Wang Dongbin Zhao Qiaoni Han Tingting Song
spellingShingle	Zhen Zhang Dongqing Wang Dongbin Zhao Qiaoni Han Tingting Song A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative Agents IEEE Access Multi-agent reinforcement learning gradient ascent Q-learning cooperative tasks
author_facet	Zhen Zhang Dongqing Wang Dongbin Zhao Qiaoni Han Tingting Song
author_sort	Zhen Zhang
title	A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative Agents
title_short	A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative Agents
title_full	A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative Agents
title_fullStr	A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative Agents
title_full_unstemmed	A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative Agents
title_sort	gradient-based reinforcement learning algorithm for multiple cooperative agents
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2018-01-01
description	Multi-agent reinforcement learning (MARL) can be used to design intelligent agents for solving cooperative tasks. Within the MARL category, this paper proposes the probability of maximal reward based on the infinitesimal gradient ascent (PMR-IGA) algorithm to reach the maximal total reward in repeated games. Theoretical analyses show that in a finite-player-finite-action repeated game with two pure optimal joint actions where no common component action exists, both the optimal joint actions are stable critical points of the PMR-IGA model. Furthermore, we apply the Q-value function to estimate the gradient and derive the probability of maximal reward based on estimated gradient ascent (PMR-EGA) algorithm. Theoretical analyses and simulations of case studies of repeated games show that the maximal total reward can be achieved under any initial conditions. The PMR-EGA can be naturally extended to optimize cooperative stochastic games. Two stochastic games, i.e., box pushing and a distributed sensor network, are used as test beds. The simulations show that the PMR-EGA displays consistently an excellent performance for both stochastic games.
topic	Multi-agent reinforcement learning gradient ascent Q-learning cooperative tasks
url	https://ieeexplore.ieee.org/document/8517104/
work_keys_str_mv	AT zhenzhang agradientbasedreinforcementlearningalgorithmformultiplecooperativeagents AT dongqingwang agradientbasedreinforcementlearningalgorithmformultiplecooperativeagents AT dongbinzhao agradientbasedreinforcementlearningalgorithmformultiplecooperativeagents AT qiaonihan agradientbasedreinforcementlearningalgorithmformultiplecooperativeagents AT tingtingsong agradientbasedreinforcementlearningalgorithmformultiplecooperativeagents AT zhenzhang gradientbasedreinforcementlearningalgorithmformultiplecooperativeagents AT dongqingwang gradientbasedreinforcementlearningalgorithmformultiplecooperativeagents AT dongbinzhao gradientbasedreinforcementlearningalgorithmformultiplecooperativeagents AT qiaonihan gradientbasedreinforcementlearningalgorithmformultiplecooperativeagents AT tingtingsong gradientbasedreinforcementlearningalgorithmformultiplecooperativeagents
_version_	1724192669332144128

A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative Agents

Similar Items