An Improved Deep Reinforcement Learning with Sparse Rewards

碩士 === 國立中山大學 === 電機工程學系研究所 === 107 === In reinforcement learning, how an agent explores in an environment with sparse rewards is a long-standing problem. An improved deep reinforcement learning described in this thesis encourages an agent to explore unvisited environmental states in an environment...

Full description

Bibliographic Details
Main Authors: Lu-cheng Chi, 紀律呈
Other Authors: Kao-Shing Hwang
Format: Others
Language:zh-TW
Published: 2018
Online Access:http://ndltd.ncl.edu.tw/handle/eq94pr
id ndltd-TW-107NSYS5442010
record_format oai_dc
spelling ndltd-TW-107NSYS54420102019-05-16T01:40:48Z http://ndltd.ncl.edu.tw/handle/eq94pr An Improved Deep Reinforcement Learning with Sparse Rewards 基於稀疏報酬改良深度加強式學習 Lu-cheng Chi 紀律呈 碩士 國立中山大學 電機工程學系研究所 107 In reinforcement learning, how an agent explores in an environment with sparse rewards is a long-standing problem. An improved deep reinforcement learning described in this thesis encourages an agent to explore unvisited environmental states in an environment with sparse rewards. In deep reinforcement learning, an agent directly uses an image observation from environment as an input to the neural network. However, some neglected observations from environment, such as depth, might provide valuable information. An improved deep reinforcement learning described in this thesis is based on the Actor-Critic algorithm and uses the convolutional neural network as a hetero-encoder between an image input and other observations from environment. In the environment with sparse rewards, we use these neglected observations from environment as a target output of supervised learning and provide an agent denser training signals through supervised learning to bootstrap reinforcement learning. In addition, we use the loss from supervised learning as the feedback for an agent’s exploration behavior in an environment, called the label reward, to encourage an agent to explore unvisited environmental states. Finally, we construct multiple neural networks by Asynchronous Advantage Actor-Critic algorithm and learn the policy with multiple agents. An improved deep reinforcement learning described in this thesis is compared with other deep reinforcement learning in an environment with sparse rewards and achieves better performance. Kao-Shing Hwang 黃國勝 2018 學位論文 ; thesis 43 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立中山大學 === 電機工程學系研究所 === 107 === In reinforcement learning, how an agent explores in an environment with sparse rewards is a long-standing problem. An improved deep reinforcement learning described in this thesis encourages an agent to explore unvisited environmental states in an environment with sparse rewards. In deep reinforcement learning, an agent directly uses an image observation from environment as an input to the neural network. However, some neglected observations from environment, such as depth, might provide valuable information. An improved deep reinforcement learning described in this thesis is based on the Actor-Critic algorithm and uses the convolutional neural network as a hetero-encoder between an image input and other observations from environment. In the environment with sparse rewards, we use these neglected observations from environment as a target output of supervised learning and provide an agent denser training signals through supervised learning to bootstrap reinforcement learning. In addition, we use the loss from supervised learning as the feedback for an agent’s exploration behavior in an environment, called the label reward, to encourage an agent to explore unvisited environmental states. Finally, we construct multiple neural networks by Asynchronous Advantage Actor-Critic algorithm and learn the policy with multiple agents. An improved deep reinforcement learning described in this thesis is compared with other deep reinforcement learning in an environment with sparse rewards and achieves better performance.
author2 Kao-Shing Hwang
author_facet Kao-Shing Hwang
Lu-cheng Chi
紀律呈
author Lu-cheng Chi
紀律呈
spellingShingle Lu-cheng Chi
紀律呈
An Improved Deep Reinforcement Learning with Sparse Rewards
author_sort Lu-cheng Chi
title An Improved Deep Reinforcement Learning with Sparse Rewards
title_short An Improved Deep Reinforcement Learning with Sparse Rewards
title_full An Improved Deep Reinforcement Learning with Sparse Rewards
title_fullStr An Improved Deep Reinforcement Learning with Sparse Rewards
title_full_unstemmed An Improved Deep Reinforcement Learning with Sparse Rewards
title_sort improved deep reinforcement learning with sparse rewards
publishDate 2018
url http://ndltd.ncl.edu.tw/handle/eq94pr
work_keys_str_mv AT luchengchi animproveddeepreinforcementlearningwithsparserewards
AT jìlǜchéng animproveddeepreinforcementlearningwithsparserewards
AT luchengchi jīyúxīshūbàochóugǎiliángshēndùjiāqiángshìxuéxí
AT jìlǜchéng jīyúxīshūbàochóugǎiliángshēndùjiāqiángshìxuéxí
AT luchengchi improveddeepreinforcementlearningwithsparserewards
AT jìlǜchéng improveddeepreinforcementlearningwithsparserewards
_version_ 1719178966002040832