A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative Agents

Multi-agent reinforcement learning (MARL) can be used to design intelligent agents for solving cooperative tasks. Within the MARL category, this paper proposes the probability of maximal reward based on the infinitesimal gradient ascent (PMR-IGA) algorithm to reach the maximal total reward in repeat...

Full description

Bibliographic Details
Main Authors: Zhen Zhang, Dongqing Wang, Dongbin Zhao, Qiaoni Han, Tingting Song
Format: Article
Language:English
Published: IEEE 2018-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8517104/
id doaj-8eaebd97e1574567be7a7f56af0df567
record_format Article
spelling doaj-8eaebd97e1574567be7a7f56af0df5672021-03-29T21:34:22ZengIEEEIEEE Access2169-35362018-01-016702237023510.1109/ACCESS.2018.28788538517104A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative AgentsZhen Zhang0https://orcid.org/0000-0002-6615-629XDongqing Wang1Dongbin Zhao2Qiaoni Han3Tingting Song4School of Automation, Qingdao University, Qingdao, ChinaSchool of Automation, Qingdao University, Qingdao, ChinaState Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, ChinaSchool of Automation, Qingdao University, Qingdao, ChinaSchool of Automation, Qingdao University, Qingdao, ChinaMulti-agent reinforcement learning (MARL) can be used to design intelligent agents for solving cooperative tasks. Within the MARL category, this paper proposes the probability of maximal reward based on the infinitesimal gradient ascent (PMR-IGA) algorithm to reach the maximal total reward in repeated games. Theoretical analyses show that in a finite-player-finite-action repeated game with two pure optimal joint actions where no common component action exists, both the optimal joint actions are stable critical points of the PMR-IGA model. Furthermore, we apply the Q-value function to estimate the gradient and derive the probability of maximal reward based on estimated gradient ascent (PMR-EGA) algorithm. Theoretical analyses and simulations of case studies of repeated games show that the maximal total reward can be achieved under any initial conditions. The PMR-EGA can be naturally extended to optimize cooperative stochastic games. Two stochastic games, i.e., box pushing and a distributed sensor network, are used as test beds. The simulations show that the PMR-EGA displays consistently an excellent performance for both stochastic games.https://ieeexplore.ieee.org/document/8517104/Multi-agent reinforcement learninggradient ascentQ-learningcooperative tasks
collection DOAJ
language English
format Article
sources DOAJ
author Zhen Zhang
Dongqing Wang
Dongbin Zhao
Qiaoni Han
Tingting Song
spellingShingle Zhen Zhang
Dongqing Wang
Dongbin Zhao
Qiaoni Han
Tingting Song
A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative Agents
IEEE Access
Multi-agent reinforcement learning
gradient ascent
Q-learning
cooperative tasks
author_facet Zhen Zhang
Dongqing Wang
Dongbin Zhao
Qiaoni Han
Tingting Song
author_sort Zhen Zhang
title A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative Agents
title_short A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative Agents
title_full A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative Agents
title_fullStr A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative Agents
title_full_unstemmed A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative Agents
title_sort gradient-based reinforcement learning algorithm for multiple cooperative agents
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2018-01-01
description Multi-agent reinforcement learning (MARL) can be used to design intelligent agents for solving cooperative tasks. Within the MARL category, this paper proposes the probability of maximal reward based on the infinitesimal gradient ascent (PMR-IGA) algorithm to reach the maximal total reward in repeated games. Theoretical analyses show that in a finite-player-finite-action repeated game with two pure optimal joint actions where no common component action exists, both the optimal joint actions are stable critical points of the PMR-IGA model. Furthermore, we apply the Q-value function to estimate the gradient and derive the probability of maximal reward based on estimated gradient ascent (PMR-EGA) algorithm. Theoretical analyses and simulations of case studies of repeated games show that the maximal total reward can be achieved under any initial conditions. The PMR-EGA can be naturally extended to optimize cooperative stochastic games. Two stochastic games, i.e., box pushing and a distributed sensor network, are used as test beds. The simulations show that the PMR-EGA displays consistently an excellent performance for both stochastic games.
topic Multi-agent reinforcement learning
gradient ascent
Q-learning
cooperative tasks
url https://ieeexplore.ieee.org/document/8517104/
work_keys_str_mv AT zhenzhang agradientbasedreinforcementlearningalgorithmformultiplecooperativeagents
AT dongqingwang agradientbasedreinforcementlearningalgorithmformultiplecooperativeagents
AT dongbinzhao agradientbasedreinforcementlearningalgorithmformultiplecooperativeagents
AT qiaonihan agradientbasedreinforcementlearningalgorithmformultiplecooperativeagents
AT tingtingsong agradientbasedreinforcementlearningalgorithmformultiplecooperativeagents
AT zhenzhang gradientbasedreinforcementlearningalgorithmformultiplecooperativeagents
AT dongqingwang gradientbasedreinforcementlearningalgorithmformultiplecooperativeagents
AT dongbinzhao gradientbasedreinforcementlearningalgorithmformultiplecooperativeagents
AT qiaonihan gradientbasedreinforcementlearningalgorithmformultiplecooperativeagents
AT tingtingsong gradientbasedreinforcementlearningalgorithmformultiplecooperativeagents
_version_ 1724192669332144128