<italic>H<sub>&#x221E;</sub></italic> Control for Discrete-Time Multi-Player Systems via Off-Policy Q-Learning

This paper presents a novel off-policy game Q-learning algorithm to solve H<sub>&#x221E;</sub> control problem for discrete-time linear multi-player systems with completely unknown system dynamics. The primary contribution of this paper lies in that the Q-learning strategy employed i...

Full description

Bibliographic Details
Main Authors: Jinna Li, Zhenfei Xiao
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8977468/
id doaj-cc4f37c562be45628cbdaeca5dfef146
record_format Article
spelling doaj-cc4f37c562be45628cbdaeca5dfef1462021-03-30T02:04:07ZengIEEEIEEE Access2169-35362020-01-018288312884610.1109/ACCESS.2020.29707608977468<italic>H<sub>&#x221E;</sub></italic> Control for Discrete-Time Multi-Player Systems via Off-Policy Q-LearningJinna Li0https://orcid.org/0000-0001-9985-6308Zhenfei Xiao1School of Information and Control Engineering, Liaoning Shihua University, Liaoning, ChinaSchool of Information and Control Engineering, Liaoning Shihua University, Liaoning, ChinaThis paper presents a novel off-policy game Q-learning algorithm to solve H<sub>&#x221E;</sub> control problem for discrete-time linear multi-player systems with completely unknown system dynamics. The primary contribution of this paper lies in that the Q-learning strategy employed in the proposed algorithm is implemented in an off-policy policy iteration approach other than on-policy learning, since the off-policy learning has some well-known advantages over the on-policy learning. All of players struggle together to minimize their common performance index meanwhile defeating the disturbance that tries to maximize the specific performance index, and finally they reach the Nash equilibrium of game resulting in satisfying disturbance attenuation condition. For finding the solution of the Nash equilibrium, H control problem is first transformed into an optimal control problem. Then an off-policy Q-learning algorithm is put forward in the typical adaptive dynamic programming (ADP) and game architecture, such that control policies of all players can be learned using only measured data. More importantly, the rigorous proof of no bias of solution to the Nash equilibrium by using the proposed off-policy game Q-learning algorithm is presented. Comparative simulation results are provided to verify the effectiveness and demonstrate the advantages of the proposed method.https://ieeexplore.ieee.org/document/8977468/<italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">H</italic>∞ controloff-policy Q-learninggame theoryNash equilibrium
collection DOAJ
language English
format Article
sources DOAJ
author Jinna Li
Zhenfei Xiao
spellingShingle Jinna Li
Zhenfei Xiao
<italic>H<sub>&#x221E;</sub></italic> Control for Discrete-Time Multi-Player Systems via Off-Policy Q-Learning
IEEE Access
<italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">H</italic>∞ control
off-policy Q-learning
game theory
Nash equilibrium
author_facet Jinna Li
Zhenfei Xiao
author_sort Jinna Li
title <italic>H<sub>&#x221E;</sub></italic> Control for Discrete-Time Multi-Player Systems via Off-Policy Q-Learning
title_short <italic>H<sub>&#x221E;</sub></italic> Control for Discrete-Time Multi-Player Systems via Off-Policy Q-Learning
title_full <italic>H<sub>&#x221E;</sub></italic> Control for Discrete-Time Multi-Player Systems via Off-Policy Q-Learning
title_fullStr <italic>H<sub>&#x221E;</sub></italic> Control for Discrete-Time Multi-Player Systems via Off-Policy Q-Learning
title_full_unstemmed <italic>H<sub>&#x221E;</sub></italic> Control for Discrete-Time Multi-Player Systems via Off-Policy Q-Learning
title_sort <italic>h<sub>&#x221e;</sub></italic> control for discrete-time multi-player systems via off-policy q-learning
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2020-01-01
description This paper presents a novel off-policy game Q-learning algorithm to solve H<sub>&#x221E;</sub> control problem for discrete-time linear multi-player systems with completely unknown system dynamics. The primary contribution of this paper lies in that the Q-learning strategy employed in the proposed algorithm is implemented in an off-policy policy iteration approach other than on-policy learning, since the off-policy learning has some well-known advantages over the on-policy learning. All of players struggle together to minimize their common performance index meanwhile defeating the disturbance that tries to maximize the specific performance index, and finally they reach the Nash equilibrium of game resulting in satisfying disturbance attenuation condition. For finding the solution of the Nash equilibrium, H control problem is first transformed into an optimal control problem. Then an off-policy Q-learning algorithm is put forward in the typical adaptive dynamic programming (ADP) and game architecture, such that control policies of all players can be learned using only measured data. More importantly, the rigorous proof of no bias of solution to the Nash equilibrium by using the proposed off-policy game Q-learning algorithm is presented. Comparative simulation results are provided to verify the effectiveness and demonstrate the advantages of the proposed method.
topic <italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">H</italic>∞ control
off-policy Q-learning
game theory
Nash equilibrium
url https://ieeexplore.ieee.org/document/8977468/
work_keys_str_mv AT jinnali italichsubx221esubitaliccontrolfordiscretetimemultiplayersystemsviaoffpolicyqlearning
AT zhenfeixiao italichsubx221esubitaliccontrolfordiscretetimemultiplayersystemsviaoffpolicyqlearning
_version_ 1724185899012456448