Q-learning with Continuous Action Value in Multi-agent Cooperation
碩士 === 國立中正大學 === 電機工程所 === 94 === In this thesis, we propose a Q-learning with continuous action space and extend this algorithm to a multi-agent system. We implement this algorithm in a task that there are two robots taking action independently and both are connected with a straight bar. They must...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Online Access: | http://ndltd.ncl.edu.tw/handle/36513682200450671073 |
id |
ndltd-TW-094CCU05442040 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-094CCU054420402015-10-13T10:45:18Z http://ndltd.ncl.edu.tw/handle/36513682200450671073 Q-learning with Continuous Action Value in Multi-agent Cooperation 具連續行為的Q-learning應用於多重代理人之合作 Yu-Hong Lin 林咩 碩士 國立中正大學 電機工程所 94 In this thesis, we propose a Q-learning with continuous action space and extend this algorithm to a multi-agent system. We implement this algorithm in a task that there are two robots taking action independently and both are connected with a straight bar. They must cooperate to move to the goal and avoid the obstacles in the environment. Conventional Q-learning needs a pre-defined and discrete state space, so it will have finite states and actions. But it is not practical because in real world the states of the environment and the actions are both continuous, so when we using Q-learning to demonstrate the action in the world, we can’t precisely identify the variances of the different situation in the same state. We use the concept of SRV (Stochastic Real-Valued Unit) to train the action in each state, so the result action will be continuous. It will make the simulation that more close to the real world; also it can fix the pre-defined action space in Q-learning, result in a more ideal learning outcome. Kao-Shing Hwang 黃國勝 學位論文 ; thesis 46 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立中正大學 === 電機工程所 === 94 === In this thesis, we propose a Q-learning with continuous action space and extend this algorithm to a multi-agent system. We implement this algorithm in a task that there are two robots taking action independently and both are connected with a straight bar. They must cooperate to move to the goal and avoid the obstacles in the environment. Conventional Q-learning needs a pre-defined and discrete state space, so it will have finite states and actions. But it is not practical because in real world the states of the environment and the actions are both continuous, so when we using Q-learning to demonstrate the action in the world, we can’t precisely identify the variances of the different situation in the same state. We use the concept of SRV (Stochastic Real-Valued Unit) to train the action in each state, so the result action will be continuous. It will make the simulation that more close to the real world; also it can fix the pre-defined action space in Q-learning, result in a more ideal learning outcome.
|
author2 |
Kao-Shing Hwang |
author_facet |
Kao-Shing Hwang Yu-Hong Lin 林咩 |
author |
Yu-Hong Lin 林咩 |
spellingShingle |
Yu-Hong Lin 林咩 Q-learning with Continuous Action Value in Multi-agent Cooperation |
author_sort |
Yu-Hong Lin |
title |
Q-learning with Continuous Action Value in Multi-agent Cooperation |
title_short |
Q-learning with Continuous Action Value in Multi-agent Cooperation |
title_full |
Q-learning with Continuous Action Value in Multi-agent Cooperation |
title_fullStr |
Q-learning with Continuous Action Value in Multi-agent Cooperation |
title_full_unstemmed |
Q-learning with Continuous Action Value in Multi-agent Cooperation |
title_sort |
q-learning with continuous action value in multi-agent cooperation |
url |
http://ndltd.ncl.edu.tw/handle/36513682200450671073 |
work_keys_str_mv |
AT yuhonglin qlearningwithcontinuousactionvalueinmultiagentcooperation AT línmiē qlearningwithcontinuousactionvalueinmultiagentcooperation AT yuhonglin jùliánxùxíngwèideqlearningyīngyòngyúduōzhòngdàilǐrénzhīhézuò AT línmiē jùliánxùxíngwèideqlearningyīngyòngyúduōzhòngdàilǐrénzhīhézuò |
_version_ |
1716833108120043520 |