<italic>H<sub>∞</sub></italic> Control for Discrete-Time Multi-Player Systems via Off-Policy Q-Learning

This paper presents a novel off-policy game Q-learning algorithm to solve H<sub>∞</sub> control problem for discrete-time linear multi-player systems with completely unknown system dynamics. The primary contribution of this paper lies in that the Q-learning strategy employed i...

Full description

Bibliographic Details
Main Authors:	Jinna Li, Zhenfei Xiao
Format:	Article
Language:	English
Published:	IEEE 2020-01-01
Series:	IEEE Access
Subjects:	<italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">H</italic>∞ control off-policy Q-learning game theory Nash equilibrium
Online Access:	https://ieeexplore.ieee.org/document/8977468/

id	doaj-cc4f37c562be45628cbdaeca5dfef146
record_format	Article
spelling	doaj-cc4f37c562be45628cbdaeca5dfef1462021-03-30T02:04:07ZengIEEEIEEE Access2169-35362020-01-018288312884610.1109/ACCESS.2020.29707608977468<italic>H<sub>∞</sub></italic> Control for Discrete-Time Multi-Player Systems via Off-Policy Q-LearningJinna Li0https://orcid.org/0000-0001-9985-6308Zhenfei Xiao1School of Information and Control Engineering, Liaoning Shihua University, Liaoning, ChinaSchool of Information and Control Engineering, Liaoning Shihua University, Liaoning, ChinaThis paper presents a novel off-policy game Q-learning algorithm to solve H<sub>∞</sub> control problem for discrete-time linear multi-player systems with completely unknown system dynamics. The primary contribution of this paper lies in that the Q-learning strategy employed in the proposed algorithm is implemented in an off-policy policy iteration approach other than on-policy learning, since the off-policy learning has some well-known advantages over the on-policy learning. All of players struggle together to minimize their common performance index meanwhile defeating the disturbance that tries to maximize the specific performance index, and finally they reach the Nash equilibrium of game resulting in satisfying disturbance attenuation condition. For finding the solution of the Nash equilibrium, H control problem is first transformed into an optimal control problem. Then an off-policy Q-learning algorithm is put forward in the typical adaptive dynamic programming (ADP) and game architecture, such that control policies of all players can be learned using only measured data. More importantly, the rigorous proof of no bias of solution to the Nash equilibrium by using the proposed off-policy game Q-learning algorithm is presented. Comparative simulation results are provided to verify the effectiveness and demonstrate the advantages of the proposed method.https://ieeexplore.ieee.org/document/8977468/<italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">H</italic>∞ controloff-policy Q-learninggame theoryNash equilibrium
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Jinna Li Zhenfei Xiao
spellingShingle	Jinna Li Zhenfei Xiao <italic>H<sub>∞</sub></italic> Control for Discrete-Time Multi-Player Systems via Off-Policy Q-Learning IEEE Access <italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">H</italic>∞ control off-policy Q-learning game theory Nash equilibrium
author_facet	Jinna Li Zhenfei Xiao
author_sort	Jinna Li
title	<italic>H<sub>∞</sub></italic> Control for Discrete-Time Multi-Player Systems via Off-Policy Q-Learning
title_short	<italic>H<sub>∞</sub></italic> Control for Discrete-Time Multi-Player Systems via Off-Policy Q-Learning
title_full	<italic>H<sub>∞</sub></italic> Control for Discrete-Time Multi-Player Systems via Off-Policy Q-Learning
title_fullStr	<italic>H<sub>∞</sub></italic> Control for Discrete-Time Multi-Player Systems via Off-Policy Q-Learning
title_full_unstemmed	<italic>H<sub>∞</sub></italic> Control for Discrete-Time Multi-Player Systems via Off-Policy Q-Learning
title_sort	<italic>h<sub>∞</sub></italic> control for discrete-time multi-player systems via off-policy q-learning
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2020-01-01
description	This paper presents a novel off-policy game Q-learning algorithm to solve H<sub>∞</sub> control problem for discrete-time linear multi-player systems with completely unknown system dynamics. The primary contribution of this paper lies in that the Q-learning strategy employed in the proposed algorithm is implemented in an off-policy policy iteration approach other than on-policy learning, since the off-policy learning has some well-known advantages over the on-policy learning. All of players struggle together to minimize their common performance index meanwhile defeating the disturbance that tries to maximize the specific performance index, and finally they reach the Nash equilibrium of game resulting in satisfying disturbance attenuation condition. For finding the solution of the Nash equilibrium, H control problem is first transformed into an optimal control problem. Then an off-policy Q-learning algorithm is put forward in the typical adaptive dynamic programming (ADP) and game architecture, such that control policies of all players can be learned using only measured data. More importantly, the rigorous proof of no bias of solution to the Nash equilibrium by using the proposed off-policy game Q-learning algorithm is presented. Comparative simulation results are provided to verify the effectiveness and demonstrate the advantages of the proposed method.
topic	<italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">H</italic>∞ control off-policy Q-learning game theory Nash equilibrium
url	https://ieeexplore.ieee.org/document/8977468/
work_keys_str_mv	AT jinnali italichsubx221esubitaliccontrolfordiscretetimemultiplayersystemsviaoffpolicyqlearning AT zhenfeixiao italichsubx221esubitaliccontrolfordiscretetimemultiplayersystemsviaoffpolicyqlearning
_version_	1724185899012456448

<italic>H<sub>&#x221E;</sub></italic> Control for Discrete-Time Multi-Player Systems via Off-Policy Q-Learning

Similar Items

<italic>H<sub>∞</sub></italic> Control for Discrete-Time Multi-Player Systems via Off-Policy Q-Learning