Q-learning with Continuous Action Value in Multi-agent Cooperation

碩士 === 國立中正大學 === 電機工程所 === 94 === In this thesis, we propose a Q-learning with continuous action space and extend this algorithm to a multi-agent system. We implement this algorithm in a task that there are two robots taking action independently and both are connected with a straight bar. They must...

Full description

Bibliographic Details
Main Authors: Yu-Hong Lin, 林咩
Other Authors: Kao-Shing Hwang
Format: Others
Language:en_US
Online Access:http://ndltd.ncl.edu.tw/handle/36513682200450671073
id ndltd-TW-094CCU05442040
record_format oai_dc
spelling ndltd-TW-094CCU054420402015-10-13T10:45:18Z http://ndltd.ncl.edu.tw/handle/36513682200450671073 Q-learning with Continuous Action Value in Multi-agent Cooperation 具連續行為的Q-learning應用於多重代理人之合作 Yu-Hong Lin 林咩 碩士 國立中正大學 電機工程所 94 In this thesis, we propose a Q-learning with continuous action space and extend this algorithm to a multi-agent system. We implement this algorithm in a task that there are two robots taking action independently and both are connected with a straight bar. They must cooperate to move to the goal and avoid the obstacles in the environment. Conventional Q-learning needs a pre-defined and discrete state space, so it will have finite states and actions. But it is not practical because in real world the states of the environment and the actions are both continuous, so when we using Q-learning to demonstrate the action in the world, we can’t precisely identify the variances of the different situation in the same state. We use the concept of SRV (Stochastic Real-Valued Unit) to train the action in each state, so the result action will be continuous. It will make the simulation that more close to the real world; also it can fix the pre-defined action space in Q-learning, result in a more ideal learning outcome. Kao-Shing Hwang 黃國勝 學位論文 ; thesis 46 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立中正大學 === 電機工程所 === 94 === In this thesis, we propose a Q-learning with continuous action space and extend this algorithm to a multi-agent system. We implement this algorithm in a task that there are two robots taking action independently and both are connected with a straight bar. They must cooperate to move to the goal and avoid the obstacles in the environment. Conventional Q-learning needs a pre-defined and discrete state space, so it will have finite states and actions. But it is not practical because in real world the states of the environment and the actions are both continuous, so when we using Q-learning to demonstrate the action in the world, we can’t precisely identify the variances of the different situation in the same state. We use the concept of SRV (Stochastic Real-Valued Unit) to train the action in each state, so the result action will be continuous. It will make the simulation that more close to the real world; also it can fix the pre-defined action space in Q-learning, result in a more ideal learning outcome.
author2 Kao-Shing Hwang
author_facet Kao-Shing Hwang
Yu-Hong Lin
林咩
author Yu-Hong Lin
林咩
spellingShingle Yu-Hong Lin
林咩
Q-learning with Continuous Action Value in Multi-agent Cooperation
author_sort Yu-Hong Lin
title Q-learning with Continuous Action Value in Multi-agent Cooperation
title_short Q-learning with Continuous Action Value in Multi-agent Cooperation
title_full Q-learning with Continuous Action Value in Multi-agent Cooperation
title_fullStr Q-learning with Continuous Action Value in Multi-agent Cooperation
title_full_unstemmed Q-learning with Continuous Action Value in Multi-agent Cooperation
title_sort q-learning with continuous action value in multi-agent cooperation
url http://ndltd.ncl.edu.tw/handle/36513682200450671073
work_keys_str_mv AT yuhonglin qlearningwithcontinuousactionvalueinmultiagentcooperation
AT línmiē qlearningwithcontinuousactionvalueinmultiagentcooperation
AT yuhonglin jùliánxùxíngwèideqlearningyīngyòngyúduōzhòngdàilǐrénzhīhézuò
AT línmiē jùliánxùxíngwèideqlearningyīngyòngyúduōzhòngdàilǐrénzhīhézuò
_version_ 1716833108120043520