Averaged Soft Actor-Critic for Deep Reinforcement Learning
With the advent of the era of artificial intelligence, deep reinforcement learning (DRL) has achieved unprecedented success in high-dimensional and large-scale artificial intelligence tasks. However, the insecurity and instability of the DRL algorithm have an important impact on its performance. The...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Hindawi-Wiley
2021-01-01
|
Series: | Complexity |
Online Access: | http://dx.doi.org/10.1155/2021/6658724 |
id |
doaj-03969153b56d499d950ce34cd5b8e312 |
---|---|
record_format |
Article |
spelling |
doaj-03969153b56d499d950ce34cd5b8e3122021-04-12T01:23:16ZengHindawi-WileyComplexity1099-05262021-01-01202110.1155/2021/6658724Averaged Soft Actor-Critic for Deep Reinforcement LearningFeng Ding0Guanfeng Ma1Zhikui Chen2Jing Gao3Peng Li4School of Software TechnologySchool of Software TechnologySchool of Software TechnologySchool of Software TechnologySchool of Software TechnologyWith the advent of the era of artificial intelligence, deep reinforcement learning (DRL) has achieved unprecedented success in high-dimensional and large-scale artificial intelligence tasks. However, the insecurity and instability of the DRL algorithm have an important impact on its performance. The Soft Actor-Critic (SAC) algorithm uses advanced functions to update the policy and value network to alleviate some of these problems. However, SAC still has some problems. In order to reduce the error caused by the overestimation of SAC, we propose a new SAC algorithm called Averaged-SAC. By averaging the previously learned action-state estimates, it reduces the overestimation problem of soft Q-learning, thereby contributing to a more stable training process and improving performance. We evaluate the performance of Averaged-SAC through some games in the MuJoCo environment. The experimental results show that the Averaged-SAC algorithm effectively improves the performance of the SAC algorithm and the stability of the training process.http://dx.doi.org/10.1155/2021/6658724 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Feng Ding Guanfeng Ma Zhikui Chen Jing Gao Peng Li |
spellingShingle |
Feng Ding Guanfeng Ma Zhikui Chen Jing Gao Peng Li Averaged Soft Actor-Critic for Deep Reinforcement Learning Complexity |
author_facet |
Feng Ding Guanfeng Ma Zhikui Chen Jing Gao Peng Li |
author_sort |
Feng Ding |
title |
Averaged Soft Actor-Critic for Deep Reinforcement Learning |
title_short |
Averaged Soft Actor-Critic for Deep Reinforcement Learning |
title_full |
Averaged Soft Actor-Critic for Deep Reinforcement Learning |
title_fullStr |
Averaged Soft Actor-Critic for Deep Reinforcement Learning |
title_full_unstemmed |
Averaged Soft Actor-Critic for Deep Reinforcement Learning |
title_sort |
averaged soft actor-critic for deep reinforcement learning |
publisher |
Hindawi-Wiley |
series |
Complexity |
issn |
1099-0526 |
publishDate |
2021-01-01 |
description |
With the advent of the era of artificial intelligence, deep reinforcement learning (DRL) has achieved unprecedented success in high-dimensional and large-scale artificial intelligence tasks. However, the insecurity and instability of the DRL algorithm have an important impact on its performance. The Soft Actor-Critic (SAC) algorithm uses advanced functions to update the policy and value network to alleviate some of these problems. However, SAC still has some problems. In order to reduce the error caused by the overestimation of SAC, we propose a new SAC algorithm called Averaged-SAC. By averaging the previously learned action-state estimates, it reduces the overestimation problem of soft Q-learning, thereby contributing to a more stable training process and improving performance. We evaluate the performance of Averaged-SAC through some games in the MuJoCo environment. The experimental results show that the Averaged-SAC algorithm effectively improves the performance of the SAC algorithm and the stability of the training process. |
url |
http://dx.doi.org/10.1155/2021/6658724 |
work_keys_str_mv |
AT fengding averagedsoftactorcriticfordeepreinforcementlearning AT guanfengma averagedsoftactorcriticfordeepreinforcementlearning AT zhikuichen averagedsoftactorcriticfordeepreinforcementlearning AT jinggao averagedsoftactorcriticfordeepreinforcementlearning AT pengli averagedsoftactorcriticfordeepreinforcementlearning |
_version_ |
1714683047623786496 |