Policy Return: A New Method for Reducing the Number of Experimental Trials in Deep Reinforcement Learning

Using the same algorithm and hyperparameter configurations, deep reinforcement learning (DRL) will derive drastically different results from multiple experimental trials, and most of these results are unsatisfactory. Because of the instability of the results, researchers have to perform many trials...

Full description

Bibliographic Details
Main Authors: Feng Liu, Shuling Dai, Yongjia Zhao
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9298771/