Safe Reinforcement Learning based Sequential Perturbation Learning Algorithm

碩士 === 國立交通大學 === 電機與控制工程系所 === 97 === This article is about sequential perturbation learning architecture through safe reinforcement learning (SRL-SP) which based on the concept of linear search to apply perturbations on each weight value of the neural network. The evaluation of value of function b...

Full description

Bibliographic Details
Main Authors: Ho, Chang-An, 何長安
Other Authors: Lin, Sheng-Fuu
Format: Others
Language:zh-TW
Published: 2009
Online Access:http://ndltd.ncl.edu.tw/handle/63234750154932788712
id ndltd-TW-097NCTU5591131
record_format oai_dc
spelling ndltd-TW-097NCTU55911312015-10-13T15:42:34Z http://ndltd.ncl.edu.tw/handle/63234750154932788712 Safe Reinforcement Learning based Sequential Perturbation Learning Algorithm 基於安全性增強式學習之循序擾動學習演算法 Ho, Chang-An 何長安 碩士 國立交通大學 電機與控制工程系所 97 This article is about sequential perturbation learning architecture through safe reinforcement learning (SRL-SP) which based on the concept of linear search to apply perturbations on each weight value of the neural network. The evaluation of value of function between pre-perturb and post-perturb network is executed after the perturbations are applied, so as to update the weights. Applying perturbations can avoid the solution form the phenomenon which falls into the hands of local solution and oscillating in the solution space that decreases the learning efficiency. Besides, in the reinforcement learning structure, use the Lyapunov design methods to set the learning objective and pre-defined set of the goal state. This method would greatly reduces the learning time, in other words, it can rapidly guide the plant’s state into the goal state. During the simulation, use the n-mass inverted pendulum model to perform the experiment of humanoid robot model. To prove the method in this article is more effective in learning. Lin, Sheng-Fuu 林昇甫 2009 學位論文 ; thesis 89 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立交通大學 === 電機與控制工程系所 === 97 === This article is about sequential perturbation learning architecture through safe reinforcement learning (SRL-SP) which based on the concept of linear search to apply perturbations on each weight value of the neural network. The evaluation of value of function between pre-perturb and post-perturb network is executed after the perturbations are applied, so as to update the weights. Applying perturbations can avoid the solution form the phenomenon which falls into the hands of local solution and oscillating in the solution space that decreases the learning efficiency. Besides, in the reinforcement learning structure, use the Lyapunov design methods to set the learning objective and pre-defined set of the goal state. This method would greatly reduces the learning time, in other words, it can rapidly guide the plant’s state into the goal state. During the simulation, use the n-mass inverted pendulum model to perform the experiment of humanoid robot model. To prove the method in this article is more effective in learning.
author2 Lin, Sheng-Fuu
author_facet Lin, Sheng-Fuu
Ho, Chang-An
何長安
author Ho, Chang-An
何長安
spellingShingle Ho, Chang-An
何長安
Safe Reinforcement Learning based Sequential Perturbation Learning Algorithm
author_sort Ho, Chang-An
title Safe Reinforcement Learning based Sequential Perturbation Learning Algorithm
title_short Safe Reinforcement Learning based Sequential Perturbation Learning Algorithm
title_full Safe Reinforcement Learning based Sequential Perturbation Learning Algorithm
title_fullStr Safe Reinforcement Learning based Sequential Perturbation Learning Algorithm
title_full_unstemmed Safe Reinforcement Learning based Sequential Perturbation Learning Algorithm
title_sort safe reinforcement learning based sequential perturbation learning algorithm
publishDate 2009
url http://ndltd.ncl.edu.tw/handle/63234750154932788712
work_keys_str_mv AT hochangan safereinforcementlearningbasedsequentialperturbationlearningalgorithm
AT hézhǎngān safereinforcementlearningbasedsequentialperturbationlearningalgorithm
AT hochangan jīyúānquánxìngzēngqiángshìxuéxízhīxúnxùrǎodòngxuéxíyǎnsuànfǎ
AT hézhǎngān jīyúānquánxìngzēngqiángshìxuéxízhīxúnxùrǎodòngxuéxíyǎnsuànfǎ
_version_ 1717768430870855680