<italic>H<sub>∞</sub></italic> Control for Discrete-Time Multi-Player Systems via Off-Policy Q-Learning
This paper presents a novel off-policy game Q-learning algorithm to solve H<sub>∞</sub> control problem for discrete-time linear multi-player systems with completely unknown system dynamics. The primary contribution of this paper lies in that the Q-learning strategy employed i...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2020-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8977468/ |
id |
doaj-cc4f37c562be45628cbdaeca5dfef146 |
---|---|
record_format |
Article |
spelling |
doaj-cc4f37c562be45628cbdaeca5dfef1462021-03-30T02:04:07ZengIEEEIEEE Access2169-35362020-01-018288312884610.1109/ACCESS.2020.29707608977468<italic>H<sub>∞</sub></italic> Control for Discrete-Time Multi-Player Systems via Off-Policy Q-LearningJinna Li0https://orcid.org/0000-0001-9985-6308Zhenfei Xiao1School of Information and Control Engineering, Liaoning Shihua University, Liaoning, ChinaSchool of Information and Control Engineering, Liaoning Shihua University, Liaoning, ChinaThis paper presents a novel off-policy game Q-learning algorithm to solve H<sub>∞</sub> control problem for discrete-time linear multi-player systems with completely unknown system dynamics. The primary contribution of this paper lies in that the Q-learning strategy employed in the proposed algorithm is implemented in an off-policy policy iteration approach other than on-policy learning, since the off-policy learning has some well-known advantages over the on-policy learning. All of players struggle together to minimize their common performance index meanwhile defeating the disturbance that tries to maximize the specific performance index, and finally they reach the Nash equilibrium of game resulting in satisfying disturbance attenuation condition. For finding the solution of the Nash equilibrium, H control problem is first transformed into an optimal control problem. Then an off-policy Q-learning algorithm is put forward in the typical adaptive dynamic programming (ADP) and game architecture, such that control policies of all players can be learned using only measured data. More importantly, the rigorous proof of no bias of solution to the Nash equilibrium by using the proposed off-policy game Q-learning algorithm is presented. Comparative simulation results are provided to verify the effectiveness and demonstrate the advantages of the proposed method.https://ieeexplore.ieee.org/document/8977468/<italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">H</italic>∞ controloff-policy Q-learninggame theoryNash equilibrium |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Jinna Li Zhenfei Xiao |
spellingShingle |
Jinna Li Zhenfei Xiao <italic>H<sub>∞</sub></italic> Control for Discrete-Time Multi-Player Systems via Off-Policy Q-Learning IEEE Access <italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">H</italic>∞ control off-policy Q-learning game theory Nash equilibrium |
author_facet |
Jinna Li Zhenfei Xiao |
author_sort |
Jinna Li |
title |
<italic>H<sub>∞</sub></italic> Control for Discrete-Time Multi-Player Systems via Off-Policy Q-Learning |
title_short |
<italic>H<sub>∞</sub></italic> Control for Discrete-Time Multi-Player Systems via Off-Policy Q-Learning |
title_full |
<italic>H<sub>∞</sub></italic> Control for Discrete-Time Multi-Player Systems via Off-Policy Q-Learning |
title_fullStr |
<italic>H<sub>∞</sub></italic> Control for Discrete-Time Multi-Player Systems via Off-Policy Q-Learning |
title_full_unstemmed |
<italic>H<sub>∞</sub></italic> Control for Discrete-Time Multi-Player Systems via Off-Policy Q-Learning |
title_sort |
<italic>h<sub>∞</sub></italic> control for discrete-time multi-player systems via off-policy q-learning |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2020-01-01 |
description |
This paper presents a novel off-policy game Q-learning algorithm to solve H<sub>∞</sub> control problem for discrete-time linear multi-player systems with completely unknown system dynamics. The primary contribution of this paper lies in that the Q-learning strategy employed in the proposed algorithm is implemented in an off-policy policy iteration approach other than on-policy learning, since the off-policy learning has some well-known advantages over the on-policy learning. All of players struggle together to minimize their common performance index meanwhile defeating the disturbance that tries to maximize the specific performance index, and finally they reach the Nash equilibrium of game resulting in satisfying disturbance attenuation condition. For finding the solution of the Nash equilibrium, H control problem is first transformed into an optimal control problem. Then an off-policy Q-learning algorithm is put forward in the typical adaptive dynamic programming (ADP) and game architecture, such that control policies of all players can be learned using only measured data. More importantly, the rigorous proof of no bias of solution to the Nash equilibrium by using the proposed off-policy game Q-learning algorithm is presented. Comparative simulation results are provided to verify the effectiveness and demonstrate the advantages of the proposed method. |
topic |
<italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">H</italic>∞ control off-policy Q-learning game theory Nash equilibrium |
url |
https://ieeexplore.ieee.org/document/8977468/ |
work_keys_str_mv |
AT jinnali italichsubx221esubitaliccontrolfordiscretetimemultiplayersystemsviaoffpolicyqlearning AT zhenfeixiao italichsubx221esubitaliccontrolfordiscretetimemultiplayersystemsviaoffpolicyqlearning |
_version_ |
1724185899012456448 |