An Improved Deep Reinforcement Learning with Sparse Rewards
碩士 === 國立中山大學 === 電機工程學系研究所 === 107 === In reinforcement learning, how an agent explores in an environment with sparse rewards is a long-standing problem. An improved deep reinforcement learning described in this thesis encourages an agent to explore unvisited environmental states in an environment...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2018
|
Online Access: | http://ndltd.ncl.edu.tw/handle/eq94pr |
id |
ndltd-TW-107NSYS5442010 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-107NSYS54420102019-05-16T01:40:48Z http://ndltd.ncl.edu.tw/handle/eq94pr An Improved Deep Reinforcement Learning with Sparse Rewards 基於稀疏報酬改良深度加強式學習 Lu-cheng Chi 紀律呈 碩士 國立中山大學 電機工程學系研究所 107 In reinforcement learning, how an agent explores in an environment with sparse rewards is a long-standing problem. An improved deep reinforcement learning described in this thesis encourages an agent to explore unvisited environmental states in an environment with sparse rewards. In deep reinforcement learning, an agent directly uses an image observation from environment as an input to the neural network. However, some neglected observations from environment, such as depth, might provide valuable information. An improved deep reinforcement learning described in this thesis is based on the Actor-Critic algorithm and uses the convolutional neural network as a hetero-encoder between an image input and other observations from environment. In the environment with sparse rewards, we use these neglected observations from environment as a target output of supervised learning and provide an agent denser training signals through supervised learning to bootstrap reinforcement learning. In addition, we use the loss from supervised learning as the feedback for an agent’s exploration behavior in an environment, called the label reward, to encourage an agent to explore unvisited environmental states. Finally, we construct multiple neural networks by Asynchronous Advantage Actor-Critic algorithm and learn the policy with multiple agents. An improved deep reinforcement learning described in this thesis is compared with other deep reinforcement learning in an environment with sparse rewards and achieves better performance. Kao-Shing Hwang 黃國勝 2018 學位論文 ; thesis 43 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立中山大學 === 電機工程學系研究所 === 107 === In reinforcement learning, how an agent explores in an environment with sparse rewards is a long-standing problem. An improved deep reinforcement learning described in this thesis encourages an agent to explore unvisited environmental states in an environment with sparse rewards.
In deep reinforcement learning, an agent directly uses an image observation from environment as an input to the neural network. However, some neglected observations from environment, such as depth, might provide valuable information.
An improved deep reinforcement learning described in this thesis is based on the Actor-Critic algorithm and uses the convolutional neural network as a hetero-encoder between an image input and other observations from environment. In the environment with sparse rewards, we use these neglected observations from environment as a target output of supervised learning and provide an agent denser training signals through supervised learning to bootstrap reinforcement learning. In addition, we use the loss from supervised learning as the feedback for an agent’s exploration behavior in an environment, called the label reward, to encourage an agent to explore unvisited environmental states. Finally, we construct multiple neural networks by Asynchronous Advantage Actor-Critic algorithm and learn the policy with multiple agents.
An improved deep reinforcement learning described in this thesis is compared with other deep reinforcement learning in an environment with sparse rewards and achieves better performance.
|
author2 |
Kao-Shing Hwang |
author_facet |
Kao-Shing Hwang Lu-cheng Chi 紀律呈 |
author |
Lu-cheng Chi 紀律呈 |
spellingShingle |
Lu-cheng Chi 紀律呈 An Improved Deep Reinforcement Learning with Sparse Rewards |
author_sort |
Lu-cheng Chi |
title |
An Improved Deep Reinforcement Learning with Sparse Rewards |
title_short |
An Improved Deep Reinforcement Learning with Sparse Rewards |
title_full |
An Improved Deep Reinforcement Learning with Sparse Rewards |
title_fullStr |
An Improved Deep Reinforcement Learning with Sparse Rewards |
title_full_unstemmed |
An Improved Deep Reinforcement Learning with Sparse Rewards |
title_sort |
improved deep reinforcement learning with sparse rewards |
publishDate |
2018 |
url |
http://ndltd.ncl.edu.tw/handle/eq94pr |
work_keys_str_mv |
AT luchengchi animproveddeepreinforcementlearningwithsparserewards AT jìlǜchéng animproveddeepreinforcementlearningwithsparserewards AT luchengchi jīyúxīshūbàochóugǎiliángshēndùjiāqiángshìxuéxí AT jìlǜchéng jīyúxīshūbàochóugǎiliángshēndùjiāqiángshìxuéxí AT luchengchi improveddeepreinforcementlearningwithsparserewards AT jìlǜchéng improveddeepreinforcementlearningwithsparserewards |
_version_ |
1719178966002040832 |