A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative Agents
Multi-agent reinforcement learning (MARL) can be used to design intelligent agents for solving cooperative tasks. Within the MARL category, this paper proposes the probability of maximal reward based on the infinitesimal gradient ascent (PMR-IGA) algorithm to reach the maximal total reward in repeat...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2018-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8517104/ |
id |
doaj-8eaebd97e1574567be7a7f56af0df567 |
---|---|
record_format |
Article |
spelling |
doaj-8eaebd97e1574567be7a7f56af0df5672021-03-29T21:34:22ZengIEEEIEEE Access2169-35362018-01-016702237023510.1109/ACCESS.2018.28788538517104A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative AgentsZhen Zhang0https://orcid.org/0000-0002-6615-629XDongqing Wang1Dongbin Zhao2Qiaoni Han3Tingting Song4School of Automation, Qingdao University, Qingdao, ChinaSchool of Automation, Qingdao University, Qingdao, ChinaState Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, ChinaSchool of Automation, Qingdao University, Qingdao, ChinaSchool of Automation, Qingdao University, Qingdao, ChinaMulti-agent reinforcement learning (MARL) can be used to design intelligent agents for solving cooperative tasks. Within the MARL category, this paper proposes the probability of maximal reward based on the infinitesimal gradient ascent (PMR-IGA) algorithm to reach the maximal total reward in repeated games. Theoretical analyses show that in a finite-player-finite-action repeated game with two pure optimal joint actions where no common component action exists, both the optimal joint actions are stable critical points of the PMR-IGA model. Furthermore, we apply the Q-value function to estimate the gradient and derive the probability of maximal reward based on estimated gradient ascent (PMR-EGA) algorithm. Theoretical analyses and simulations of case studies of repeated games show that the maximal total reward can be achieved under any initial conditions. The PMR-EGA can be naturally extended to optimize cooperative stochastic games. Two stochastic games, i.e., box pushing and a distributed sensor network, are used as test beds. The simulations show that the PMR-EGA displays consistently an excellent performance for both stochastic games.https://ieeexplore.ieee.org/document/8517104/Multi-agent reinforcement learninggradient ascentQ-learningcooperative tasks |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Zhen Zhang Dongqing Wang Dongbin Zhao Qiaoni Han Tingting Song |
spellingShingle |
Zhen Zhang Dongqing Wang Dongbin Zhao Qiaoni Han Tingting Song A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative Agents IEEE Access Multi-agent reinforcement learning gradient ascent Q-learning cooperative tasks |
author_facet |
Zhen Zhang Dongqing Wang Dongbin Zhao Qiaoni Han Tingting Song |
author_sort |
Zhen Zhang |
title |
A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative Agents |
title_short |
A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative Agents |
title_full |
A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative Agents |
title_fullStr |
A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative Agents |
title_full_unstemmed |
A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative Agents |
title_sort |
gradient-based reinforcement learning algorithm for multiple cooperative agents |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2018-01-01 |
description |
Multi-agent reinforcement learning (MARL) can be used to design intelligent agents for solving cooperative tasks. Within the MARL category, this paper proposes the probability of maximal reward based on the infinitesimal gradient ascent (PMR-IGA) algorithm to reach the maximal total reward in repeated games. Theoretical analyses show that in a finite-player-finite-action repeated game with two pure optimal joint actions where no common component action exists, both the optimal joint actions are stable critical points of the PMR-IGA model. Furthermore, we apply the Q-value function to estimate the gradient and derive the probability of maximal reward based on estimated gradient ascent (PMR-EGA) algorithm. Theoretical analyses and simulations of case studies of repeated games show that the maximal total reward can be achieved under any initial conditions. The PMR-EGA can be naturally extended to optimize cooperative stochastic games. Two stochastic games, i.e., box pushing and a distributed sensor network, are used as test beds. The simulations show that the PMR-EGA displays consistently an excellent performance for both stochastic games. |
topic |
Multi-agent reinforcement learning gradient ascent Q-learning cooperative tasks |
url |
https://ieeexplore.ieee.org/document/8517104/ |
work_keys_str_mv |
AT zhenzhang agradientbasedreinforcementlearningalgorithmformultiplecooperativeagents AT dongqingwang agradientbasedreinforcementlearningalgorithmformultiplecooperativeagents AT dongbinzhao agradientbasedreinforcementlearningalgorithmformultiplecooperativeagents AT qiaonihan agradientbasedreinforcementlearningalgorithmformultiplecooperativeagents AT tingtingsong agradientbasedreinforcementlearningalgorithmformultiplecooperativeagents AT zhenzhang gradientbasedreinforcementlearningalgorithmformultiplecooperativeagents AT dongqingwang gradientbasedreinforcementlearningalgorithmformultiplecooperativeagents AT dongbinzhao gradientbasedreinforcementlearningalgorithmformultiplecooperativeagents AT qiaonihan gradientbasedreinforcementlearningalgorithmformultiplecooperativeagents AT tingtingsong gradientbasedreinforcementlearningalgorithmformultiplecooperativeagents |
_version_ |
1724192669332144128 |