Deep Deterministic Policy Gradient Based on Double Network Prioritized Experience Replay

The traditional deep deterministic policy gradient (DDPG) algorithm has the disadvantages of slow convergence velocity and ease of falling into the local optimum. From these two perspectives, a DDPG algorithm based on the double network prioritized experience replay mechanism (DNPER-DDPG) is propose...

Full description

Bibliographic Details
Main Authors:	Chaohai Kang, Chuiting Rong, Weijian Ren, Fengcai Huo, Pengyun Liu
Format:	Article
Language:	English
Published:	IEEE 2021-01-01
Series:	IEEE Access
Subjects:	Continuous action space deep deterministic policy gradient experience replay mechanism function approximation error priority division
Online Access:	https://ieeexplore.ieee.org/document/9409070/

id	doaj-314053f1e11a4419b41b1f6edab592d4
record_format	Article
spelling	doaj-314053f1e11a4419b41b1f6edab592d42021-04-23T23:00:29ZengIEEEIEEE Access2169-35362021-01-019602966030810.1109/ACCESS.2021.30745359409070Deep Deterministic Policy Gradient Based on Double Network Prioritized Experience ReplayChaohai Kang0Chuiting Rong1https://orcid.org/0000-0001-5961-2165Weijian Ren2https://orcid.org/0000-0001-9279-1951Fengcai Huo3Pengyun Liu4College of Electrical and Information Engineering, Northeast Petroleum University, Daqing, ChinaCollege of Electrical and Information Engineering, Northeast Petroleum University, Daqing, ChinaCollege of Electrical and Information Engineering, Northeast Petroleum University, Daqing, ChinaCollege of Electrical and Information Engineering, Northeast Petroleum University, Daqing, ChinaCollege of Electrical and Information Engineering, Northeast Petroleum University, Daqing, ChinaThe traditional deep deterministic policy gradient (DDPG) algorithm has the disadvantages of slow convergence velocity and ease of falling into the local optimum. From these two perspectives, a DDPG algorithm based on the double network prioritized experience replay mechanism (DNPER-DDPG) is proposed in this paper. Firstly, the value function is approximated by introducing the idea of two neural networks, and the minimum of the action value functions generated by the two networks is selected as the updated value of the actor policy network, which reduces the incidence of local optimal policy. Then, the Q values obtained by the two networks and the immediate reward obtained by the environment are used as the criteria for prioritization, and the importance of the samples in the experience replay mechanism is divided to improve the convergence speed of the algorithm. Finally, the improved method is demonstrated in the classic control environment of OpenAI Gym, and the results show that the proposed method has increased convergence speed and cumulative reward compared with the comparison algorithm.https://ieeexplore.ieee.org/document/9409070/Continuous action spacedeep deterministic policy gradientexperience replay mechanismfunction approximation errorpriority division
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Chaohai Kang Chuiting Rong Weijian Ren Fengcai Huo Pengyun Liu
spellingShingle	Chaohai Kang Chuiting Rong Weijian Ren Fengcai Huo Pengyun Liu Deep Deterministic Policy Gradient Based on Double Network Prioritized Experience Replay IEEE Access Continuous action space deep deterministic policy gradient experience replay mechanism function approximation error priority division
author_facet	Chaohai Kang Chuiting Rong Weijian Ren Fengcai Huo Pengyun Liu
author_sort	Chaohai Kang
title	Deep Deterministic Policy Gradient Based on Double Network Prioritized Experience Replay
title_short	Deep Deterministic Policy Gradient Based on Double Network Prioritized Experience Replay
title_full	Deep Deterministic Policy Gradient Based on Double Network Prioritized Experience Replay
title_fullStr	Deep Deterministic Policy Gradient Based on Double Network Prioritized Experience Replay
title_full_unstemmed	Deep Deterministic Policy Gradient Based on Double Network Prioritized Experience Replay
title_sort	deep deterministic policy gradient based on double network prioritized experience replay
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2021-01-01
description	The traditional deep deterministic policy gradient (DDPG) algorithm has the disadvantages of slow convergence velocity and ease of falling into the local optimum. From these two perspectives, a DDPG algorithm based on the double network prioritized experience replay mechanism (DNPER-DDPG) is proposed in this paper. Firstly, the value function is approximated by introducing the idea of two neural networks, and the minimum of the action value functions generated by the two networks is selected as the updated value of the actor policy network, which reduces the incidence of local optimal policy. Then, the Q values obtained by the two networks and the immediate reward obtained by the environment are used as the criteria for prioritization, and the importance of the samples in the experience replay mechanism is divided to improve the convergence speed of the algorithm. Finally, the improved method is demonstrated in the classic control environment of OpenAI Gym, and the results show that the proposed method has increased convergence speed and cumulative reward compared with the comparison algorithm.
topic	Continuous action space deep deterministic policy gradient experience replay mechanism function approximation error priority division
url	https://ieeexplore.ieee.org/document/9409070/
work_keys_str_mv	AT chaohaikang deepdeterministicpolicygradientbasedondoublenetworkprioritizedexperiencereplay AT chuitingrong deepdeterministicpolicygradientbasedondoublenetworkprioritizedexperiencereplay AT weijianren deepdeterministicpolicygradientbasedondoublenetworkprioritizedexperiencereplay AT fengcaihuo deepdeterministicpolicygradientbasedondoublenetworkprioritizedexperiencereplay AT pengyunliu deepdeterministicpolicygradientbasedondoublenetworkprioritizedexperiencereplay
_version_	1721512271462531072

Deep Deterministic Policy Gradient Based on Double Network Prioritized Experience Replay

Similar Items