Action Segmentation and Learning by Inverse Reinforcement Learning

碩士 === 國立中山大學 === 電機工程學系研究所 === 104 === Reinforcement learning allows agents to learn behaviors through trial and error. However, as the level of difficulty increases, the reward function of the mission also becomes harder to be defined. By combining the concepts of Adaboost classifier and Upper C...

Full description

Bibliographic Details
Main Authors: Hsuan-yi Chiang, 江炫儀
Other Authors: Kao-Shing Hwang
Format: Others
Language:en_US
Published: 2015
Online Access:http://ndltd.ncl.edu.tw/handle/24130256006959006664
id ndltd-TW-104NSYS5442021
record_format oai_dc
spelling ndltd-TW-104NSYS54420212017-07-30T04:41:11Z http://ndltd.ncl.edu.tw/handle/24130256006959006664 Action Segmentation and Learning by Inverse Reinforcement Learning 透過反加強式學習模仿作行為分段及學習 Hsuan-yi Chiang 江炫儀 碩士 國立中山大學 電機工程學系研究所 104 Reinforcement learning allows agents to learn behaviors through trial and error. However, as the level of difficulty increases, the reward function of the mission also becomes harder to be defined. By combining the concepts of Adaboost classifier and Upper Confidence Bounds (UCB), a method based on inverse reinforcement learning is proposed to construct the reward function of a complex mission. Inverse reinforcement learning allows the agent to rebuild a reward function that imitates the process of interaction between the expert and the environment. During the imitation, the agent continuously compares the difference between the expert and itself, and then the proposed methods determines a specific weight for each state via Adaboost. The weight is then combined with the state confidence from UCB to construct an approximate reward function. This thesis uses a state encoding method and action segmentation to simplify the problem, then utilize the proposed method to determine the optimal reward function. Finally, a maze environment and a soccer robot environment simulation are used to validate the proposed method, further to decreasing the computational time. Kao-Shing Hwang 黃國勝 2015 學位論文 ; thesis 70 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立中山大學 === 電機工程學系研究所 === 104 === Reinforcement learning allows agents to learn behaviors through trial and error. However, as the level of difficulty increases, the reward function of the mission also becomes harder to be defined. By combining the concepts of Adaboost classifier and Upper Confidence Bounds (UCB), a method based on inverse reinforcement learning is proposed to construct the reward function of a complex mission. Inverse reinforcement learning allows the agent to rebuild a reward function that imitates the process of interaction between the expert and the environment. During the imitation, the agent continuously compares the difference between the expert and itself, and then the proposed methods determines a specific weight for each state via Adaboost. The weight is then combined with the state confidence from UCB to construct an approximate reward function. This thesis uses a state encoding method and action segmentation to simplify the problem, then utilize the proposed method to determine the optimal reward function. Finally, a maze environment and a soccer robot environment simulation are used to validate the proposed method, further to decreasing the computational time.
author2 Kao-Shing Hwang
author_facet Kao-Shing Hwang
Hsuan-yi Chiang
江炫儀
author Hsuan-yi Chiang
江炫儀
spellingShingle Hsuan-yi Chiang
江炫儀
Action Segmentation and Learning by Inverse Reinforcement Learning
author_sort Hsuan-yi Chiang
title Action Segmentation and Learning by Inverse Reinforcement Learning
title_short Action Segmentation and Learning by Inverse Reinforcement Learning
title_full Action Segmentation and Learning by Inverse Reinforcement Learning
title_fullStr Action Segmentation and Learning by Inverse Reinforcement Learning
title_full_unstemmed Action Segmentation and Learning by Inverse Reinforcement Learning
title_sort action segmentation and learning by inverse reinforcement learning
publishDate 2015
url http://ndltd.ncl.edu.tw/handle/24130256006959006664
work_keys_str_mv AT hsuanyichiang actionsegmentationandlearningbyinversereinforcementlearning
AT jiāngxuànyí actionsegmentationandlearningbyinversereinforcementlearning
AT hsuanyichiang tòuguòfǎnjiāqiángshìxuéxímófǎngzuòxíngwèifēnduànjíxuéxí
AT jiāngxuànyí tòuguòfǎnjiāqiángshìxuéxímófǎngzuòxíngwèifēnduànjíxuéxí
_version_ 1718508896563232768