Policy Return: A New Method for Reducing the Number of Experimental Trials in Deep Reinforcement Learning

Using the same algorithm and hyperparameter configurations, deep reinforcement learning (DRL) will derive drastically different results from multiple experimental trials, and most of these results are unsatisfactory. Because of the instability of the results, researchers have to perform many trials...

Full description

Bibliographic Details
Main Authors:	Feng Liu, Shuling Dai, Yongjia Zhao
Format:	Article
Language:	English
Published:	IEEE 2020-01-01
Series:	IEEE Access
Subjects:	Deep reinforcement learning policy return method fewer trials stochastic data
Online Access:	https://ieeexplore.ieee.org/document/9298771/

id	doaj-ce74b71c5e014fbbac96e1e91fa156c3
record_format	Article
spelling	doaj-ce74b71c5e014fbbac96e1e91fa156c32021-03-30T04:25:07ZengIEEEIEEE Access2169-35362020-01-01822809922810710.1109/ACCESS.2020.30458359298771Policy Return: A New Method for Reducing the Number of Experimental Trials in Deep Reinforcement LearningFeng Liu0https://orcid.org/0000-0002-9006-4520Shuling Dai1https://orcid.org/0000-0002-2934-9033Yongjia Zhao2https://orcid.org/0000-0002-4557-9066State Key Laboratory of VR Technology & Systems, Beihang University (BUAA), Beijing, ChinaState Key Laboratory of VR Technology & Systems, Beihang University (BUAA), Beijing, ChinaState Key Laboratory of VR Technology & Systems, Beihang University (BUAA), Beijing, ChinaUsing the same algorithm and hyperparameter configurations, deep reinforcement learning (DRL) will derive drastically different results from multiple experimental trials, and most of these results are unsatisfactory. Because of the instability of the results, researchers have to perform many trials to confirm an algorithm or a set of hyperparameters in DRL. In this article, we present the policy return method, which is a new design for reducing the number of trials when training a DRL model. This method allows the learned policy to return to a previous state when it becomes divergent or stagnant at any stage of training. When returning, a certain percentage of stochastic data is added to the weights of the neural networks to prevent a repeated decline. Extensive experiments on challenging tasks and various target scores demonstrate that the policy return method can bring about a 10% to 40% reduction in the required number of trials compared with that of the corresponding original algorithm, and a 10% to 30% reduction compared with the state-of-the-art algorithms.https://ieeexplore.ieee.org/document/9298771/Deep reinforcement learningpolicy return methodfewer trialsstochastic data
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Feng Liu Shuling Dai Yongjia Zhao
spellingShingle	Feng Liu Shuling Dai Yongjia Zhao Policy Return: A New Method for Reducing the Number of Experimental Trials in Deep Reinforcement Learning IEEE Access Deep reinforcement learning policy return method fewer trials stochastic data
author_facet	Feng Liu Shuling Dai Yongjia Zhao
author_sort	Feng Liu
title	Policy Return: A New Method for Reducing the Number of Experimental Trials in Deep Reinforcement Learning
title_short	Policy Return: A New Method for Reducing the Number of Experimental Trials in Deep Reinforcement Learning
title_full	Policy Return: A New Method for Reducing the Number of Experimental Trials in Deep Reinforcement Learning
title_fullStr	Policy Return: A New Method for Reducing the Number of Experimental Trials in Deep Reinforcement Learning
title_full_unstemmed	Policy Return: A New Method for Reducing the Number of Experimental Trials in Deep Reinforcement Learning
title_sort	policy return: a new method for reducing the number of experimental trials in deep reinforcement learning
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2020-01-01
description	Using the same algorithm and hyperparameter configurations, deep reinforcement learning (DRL) will derive drastically different results from multiple experimental trials, and most of these results are unsatisfactory. Because of the instability of the results, researchers have to perform many trials to confirm an algorithm or a set of hyperparameters in DRL. In this article, we present the policy return method, which is a new design for reducing the number of trials when training a DRL model. This method allows the learned policy to return to a previous state when it becomes divergent or stagnant at any stage of training. When returning, a certain percentage of stochastic data is added to the weights of the neural networks to prevent a repeated decline. Extensive experiments on challenging tasks and various target scores demonstrate that the policy return method can bring about a 10% to 40% reduction in the required number of trials compared with that of the corresponding original algorithm, and a 10% to 30% reduction compared with the state-of-the-art algorithms.
topic	Deep reinforcement learning policy return method fewer trials stochastic data
url	https://ieeexplore.ieee.org/document/9298771/
work_keys_str_mv	AT fengliu policyreturnanewmethodforreducingthenumberofexperimentaltrialsindeepreinforcementlearning AT shulingdai policyreturnanewmethodforreducingthenumberofexperimentaltrialsindeepreinforcementlearning AT yongjiazhao policyreturnanewmethodforreducingthenumberofexperimentaltrialsindeepreinforcementlearning
_version_	1724181920229621760

Policy Return: A New Method for Reducing the Number of Experimental Trials in Deep Reinforcement Learning

Similar Items