Policy Return: A New Method for Reducing the Number of Experimental Trials in Deep Reinforcement Learning
Using the same algorithm and hyperparameter configurations, deep reinforcement learning (DRL) will derive drastically different results from multiple experimental trials, and most of these results are unsatisfactory. Because of the instability of the results, researchers have to perform many trials...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2020-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9298771/ |
id |
doaj-ce74b71c5e014fbbac96e1e91fa156c3 |
---|---|
record_format |
Article |
spelling |
doaj-ce74b71c5e014fbbac96e1e91fa156c32021-03-30T04:25:07ZengIEEEIEEE Access2169-35362020-01-01822809922810710.1109/ACCESS.2020.30458359298771Policy Return: A New Method for Reducing the Number of Experimental Trials in Deep Reinforcement LearningFeng Liu0https://orcid.org/0000-0002-9006-4520Shuling Dai1https://orcid.org/0000-0002-2934-9033Yongjia Zhao2https://orcid.org/0000-0002-4557-9066State Key Laboratory of VR Technology & Systems, Beihang University (BUAA), Beijing, ChinaState Key Laboratory of VR Technology & Systems, Beihang University (BUAA), Beijing, ChinaState Key Laboratory of VR Technology & Systems, Beihang University (BUAA), Beijing, ChinaUsing the same algorithm and hyperparameter configurations, deep reinforcement learning (DRL) will derive drastically different results from multiple experimental trials, and most of these results are unsatisfactory. Because of the instability of the results, researchers have to perform many trials to confirm an algorithm or a set of hyperparameters in DRL. In this article, we present the policy return method, which is a new design for reducing the number of trials when training a DRL model. This method allows the learned policy to return to a previous state when it becomes divergent or stagnant at any stage of training. When returning, a certain percentage of stochastic data is added to the weights of the neural networks to prevent a repeated decline. Extensive experiments on challenging tasks and various target scores demonstrate that the policy return method can bring about a 10% to 40% reduction in the required number of trials compared with that of the corresponding original algorithm, and a 10% to 30% reduction compared with the state-of-the-art algorithms.https://ieeexplore.ieee.org/document/9298771/Deep reinforcement learningpolicy return methodfewer trialsstochastic data |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Feng Liu Shuling Dai Yongjia Zhao |
spellingShingle |
Feng Liu Shuling Dai Yongjia Zhao Policy Return: A New Method for Reducing the Number of Experimental Trials in Deep Reinforcement Learning IEEE Access Deep reinforcement learning policy return method fewer trials stochastic data |
author_facet |
Feng Liu Shuling Dai Yongjia Zhao |
author_sort |
Feng Liu |
title |
Policy Return: A New Method for Reducing the Number of Experimental Trials in Deep Reinforcement Learning |
title_short |
Policy Return: A New Method for Reducing the Number of Experimental Trials in Deep Reinforcement Learning |
title_full |
Policy Return: A New Method for Reducing the Number of Experimental Trials in Deep Reinforcement Learning |
title_fullStr |
Policy Return: A New Method for Reducing the Number of Experimental Trials in Deep Reinforcement Learning |
title_full_unstemmed |
Policy Return: A New Method for Reducing the Number of Experimental Trials in Deep Reinforcement Learning |
title_sort |
policy return: a new method for reducing the number of experimental trials in deep reinforcement learning |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2020-01-01 |
description |
Using the same algorithm and hyperparameter configurations, deep reinforcement learning (DRL) will derive drastically different results from multiple experimental trials, and most of these results are unsatisfactory. Because of the instability of the results, researchers have to perform many trials to confirm an algorithm or a set of hyperparameters in DRL. In this article, we present the policy return method, which is a new design for reducing the number of trials when training a DRL model. This method allows the learned policy to return to a previous state when it becomes divergent or stagnant at any stage of training. When returning, a certain percentage of stochastic data is added to the weights of the neural networks to prevent a repeated decline. Extensive experiments on challenging tasks and various target scores demonstrate that the policy return method can bring about a 10% to 40% reduction in the required number of trials compared with that of the corresponding original algorithm, and a 10% to 30% reduction compared with the state-of-the-art algorithms. |
topic |
Deep reinforcement learning policy return method fewer trials stochastic data |
url |
https://ieeexplore.ieee.org/document/9298771/ |
work_keys_str_mv |
AT fengliu policyreturnanewmethodforreducingthenumberofexperimentaltrialsindeepreinforcementlearning AT shulingdai policyreturnanewmethodforreducingthenumberofexperimentaltrialsindeepreinforcementlearning AT yongjiazhao policyreturnanewmethodforreducingthenumberofexperimentaltrialsindeepreinforcementlearning |
_version_ |
1724181920229621760 |