A Nearer Optimal and Faster Trained Value Iteration ADP for Discrete-Time Nonlinear Systems

Adaptive dynamic programming (ADP) is generally implemented using three neural networks: model network, action network, and critic network. In the conventional works of the value iteration ADP, the model network is initialized randomly and trained by the backpropagation algorithm, whose results are...

Full description

Bibliographic Details
Main Authors:	Junping Hu, Gen Yang, Zhicheng Hou, Gong Zhang, Wenlin Yang, Weijun Wang
Format:	Article
Language:	English
Published:	IEEE 2021-01-01
Series:	IEEE Access
Subjects:	ADP value iteration genetic algorithm trigger mechanism
Online Access:	https://ieeexplore.ieee.org/document/9326299/

id	doaj-0d244ef891684464b56abf209f38778d
record_format	Article
spelling	doaj-0d244ef891684464b56abf209f38778d2021-03-30T15:15:13ZengIEEEIEEE Access2169-35362021-01-019149331494410.1109/ACCESS.2021.30519849326299A Nearer Optimal and Faster Trained Value Iteration ADP for Discrete-Time Nonlinear SystemsJunping Hu0Gen Yang1https://orcid.org/0000-0001-8606-5523Zhicheng Hou2https://orcid.org/0000-0002-5319-9856Gong Zhang3Wenlin Yang4https://orcid.org/0000-0002-6725-182XWeijun Wang5https://orcid.org/0000-0001-6011-2598College of Mechanical and Electrical Engineering, Central South University, Changsha, ChinaCollege of Mechanical and Electrical Engineering, Central South University, Changsha, ChinaGuangzhou Institute of Advanced Technology, Chinese Academy of Sciences, Guangzhou, ChinaGuangzhou Institute of Advanced Technology, Chinese Academy of Sciences, Guangzhou, ChinaGuangzhou Institute of Advanced Technology, Chinese Academy of Sciences, Guangzhou, ChinaGuangzhou Institute of Advanced Technology, Chinese Academy of Sciences, Guangzhou, ChinaAdaptive dynamic programming (ADP) is generally implemented using three neural networks: model network, action network, and critic network. In the conventional works of the value iteration ADP, the model network is initialized randomly and trained by the backpropagation algorithm, whose results are easy to get trapped in a local minimum; both the critic network and action network are trained in each outer-loop, which is time-consuming. To approximate the optimal control policy more accurately and decrease the value iteration ADP training time, we propose a nearer optimal and faster trained value iteration ADP for discrete-time nonlinear systems in this study. First, before training the model network with a backpropagation algorithm, we use a global searching method, i.e., genetic algorithm, to evolve the weights and biases of the neural network for a few generations. Second, in the outer-loop training process, we propose a trigger mechanism to decide whether to train the action network or not, which can save much training time. Examples of both linear and nonlinear systems are induced to verify the superiority of the proposed method compared with the conventional value iteration ADP. The simulation results show that the proposed algorithm can provide a nearer optimal control policy and save more training time than the conventional value iteration ADP.https://ieeexplore.ieee.org/document/9326299/ADPvalue iterationgenetic algorithmtrigger mechanism
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Junping Hu Gen Yang Zhicheng Hou Gong Zhang Wenlin Yang Weijun Wang
spellingShingle	Junping Hu Gen Yang Zhicheng Hou Gong Zhang Wenlin Yang Weijun Wang A Nearer Optimal and Faster Trained Value Iteration ADP for Discrete-Time Nonlinear Systems IEEE Access ADP value iteration genetic algorithm trigger mechanism
author_facet	Junping Hu Gen Yang Zhicheng Hou Gong Zhang Wenlin Yang Weijun Wang
author_sort	Junping Hu
title	A Nearer Optimal and Faster Trained Value Iteration ADP for Discrete-Time Nonlinear Systems
title_short	A Nearer Optimal and Faster Trained Value Iteration ADP for Discrete-Time Nonlinear Systems
title_full	A Nearer Optimal and Faster Trained Value Iteration ADP for Discrete-Time Nonlinear Systems
title_fullStr	A Nearer Optimal and Faster Trained Value Iteration ADP for Discrete-Time Nonlinear Systems
title_full_unstemmed	A Nearer Optimal and Faster Trained Value Iteration ADP for Discrete-Time Nonlinear Systems
title_sort	nearer optimal and faster trained value iteration adp for discrete-time nonlinear systems
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2021-01-01
description	Adaptive dynamic programming (ADP) is generally implemented using three neural networks: model network, action network, and critic network. In the conventional works of the value iteration ADP, the model network is initialized randomly and trained by the backpropagation algorithm, whose results are easy to get trapped in a local minimum; both the critic network and action network are trained in each outer-loop, which is time-consuming. To approximate the optimal control policy more accurately and decrease the value iteration ADP training time, we propose a nearer optimal and faster trained value iteration ADP for discrete-time nonlinear systems in this study. First, before training the model network with a backpropagation algorithm, we use a global searching method, i.e., genetic algorithm, to evolve the weights and biases of the neural network for a few generations. Second, in the outer-loop training process, we propose a trigger mechanism to decide whether to train the action network or not, which can save much training time. Examples of both linear and nonlinear systems are induced to verify the superiority of the proposed method compared with the conventional value iteration ADP. The simulation results show that the proposed algorithm can provide a nearer optimal control policy and save more training time than the conventional value iteration ADP.
topic	ADP value iteration genetic algorithm trigger mechanism
url	https://ieeexplore.ieee.org/document/9326299/
work_keys_str_mv	AT junpinghu aneareroptimalandfastertrainedvalueiterationadpfordiscretetimenonlinearsystems AT genyang aneareroptimalandfastertrainedvalueiterationadpfordiscretetimenonlinearsystems AT zhichenghou aneareroptimalandfastertrainedvalueiterationadpfordiscretetimenonlinearsystems AT gongzhang aneareroptimalandfastertrainedvalueiterationadpfordiscretetimenonlinearsystems AT wenlinyang aneareroptimalandfastertrainedvalueiterationadpfordiscretetimenonlinearsystems AT weijunwang aneareroptimalandfastertrainedvalueiterationadpfordiscretetimenonlinearsystems AT junpinghu neareroptimalandfastertrainedvalueiterationadpfordiscretetimenonlinearsystems AT genyang neareroptimalandfastertrainedvalueiterationadpfordiscretetimenonlinearsystems AT zhichenghou neareroptimalandfastertrainedvalueiterationadpfordiscretetimenonlinearsystems AT gongzhang neareroptimalandfastertrainedvalueiterationadpfordiscretetimenonlinearsystems AT wenlinyang neareroptimalandfastertrainedvalueiterationadpfordiscretetimenonlinearsystems AT weijunwang neareroptimalandfastertrainedvalueiterationadpfordiscretetimenonlinearsystems
_version_	1724179717055053824

A Nearer Optimal and Faster Trained Value Iteration ADP for Discrete-Time Nonlinear Systems

Similar Items