The Reinforcement Learning Behavior Unit Weights Searching based on Genetic Algorithm

碩士 === 國立中正大學 === 電機工程所 === 95 === This thesis proposes a scheme based on Stochastic Searching Network and (GA) Genetic Algorithm, and we use Reinforcement Learning method for action network weights searching problem. The SGRL learning scheme is a hybrid Genetic Algorithm, which integrates the Stoch...

Full description

Bibliographic Details
Main Authors: Tsung-Fei Tzou, 鄒璁飛
Other Authors: Kao-Shing Hwang
Format: Others
Language:zh-TW
Published: 2007
Online Access:http://ndltd.ncl.edu.tw/handle/67380234386530334498
Description
Summary:碩士 === 國立中正大學 === 電機工程所 === 95 === This thesis proposes a scheme based on Stochastic Searching Network and (GA) Genetic Algorithm, and we use Reinforcement Learning method for action network weights searching problem. The SGRL learning scheme is a hybrid Genetic Algorithm, which integrates the Stochastic Searching Network and the Genetic Algorithm to fulfill the Reinforcement Learning action network weights searching task. Structurally, the SGRL learning system is composed of two integrated feed-forward networks. One neural network acts as a critic network for helping the learning of the other network, the action network, which determines the outputs (actions) of the SGRL learning system, where the action network is a normal neural network. Using the TD (Temporal Difference) prediction method, the critic network can predict the external reinforcement signal and provide a more informative internal reinforcement signal to the action network. The action network uses the GA and according to the plant dynamic reference  model to adapt itself according to the internal reinforcement signal. The key concept of the SGRL learning scheme is to formulate the internal reinforcement signal contributed by the reference plant model as the fitness function for the GA. Computer simulations on controlling of the Acrobot (i.e. possessing fewer actuators than degrees of freedom) system and mountain-car system have been conducted to illustrate the performance and applicability of the proposed learning controller scheme.