Risk-Avoiding Reinforcement Learning
碩士 === 國立臺灣大學 === 資訊工程學研究所 === 99 === Traditional reinforcement learning agents focus on maximizing the expected cumulated rewards and ignore the distribution of the return. However, for some tasks people prefer actions that might not lead to as much return but more likely to avoid disaster. This th...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2011
|
Online Access: | http://ndltd.ncl.edu.tw/handle/11854708998176094577 |
id |
ndltd-TW-099NTU05392120 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-099NTU053921202015-10-16T04:03:11Z http://ndltd.ncl.edu.tw/handle/11854708998176094577 Risk-Avoiding Reinforcement Learning 可避免風險之強化學習演算法 Jung-Jung Yeh 葉蓉蓉 碩士 國立臺灣大學 資訊工程學研究所 99 Traditional reinforcement learning agents focus on maximizing the expected cumulated rewards and ignore the distribution of the return. However, for some tasks people prefer actions that might not lead to as much return but more likely to avoid disaster. This thesis proposes to define risk as the expected loss and accordingly design a risk-avoiding reinforcement learning agent. Our experiment shows that such risk-avoiding reinforcement learning agent can improve different types of risks such as variance of return, the maximal loss, the probability of fatal errors. The risk defined based on loss is capable of reducing the credit risk to the banks as well as the loss existing in stock marginal trading, which can hardly be coped effectively in the previous literatures. We design a Q-decomposed reinforcement learning system to handle the tradeoff between expected loss and return. The framework consists of two subagents and one arbiter. Subagents learn the expected loss and the expected return individually, and the arbiter evaluates the sum of the return and loss of each action and takes the best one. We perform two experiments: the grid world and Taiwanese Electronic Stock Index simulated trades. In the grid world, we evaluate the expected return and the expected loss of different level of risk-averse agents. We compare the risk-avoiding agent with the variance-penalized and risk sensitive agent in the stock trading experiment. The results show that our risk-avoiding agent can not only reduce the expected loss but also cut down other kinds of risks. Shou-de Lin 林守德 2011 學位論文 ; thesis 59 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺灣大學 === 資訊工程學研究所 === 99 === Traditional reinforcement learning agents focus on maximizing the expected cumulated rewards and ignore the distribution of the return. However, for some tasks people prefer actions that might not lead to as much return but more likely to avoid disaster. This thesis proposes to define risk as the expected loss and accordingly design a risk-avoiding reinforcement learning agent. Our experiment shows that such risk-avoiding reinforcement learning agent can improve different types of risks such as variance of return, the maximal loss, the probability of fatal errors. The risk defined based on loss is capable of reducing the credit risk to the banks as well as the loss existing in stock marginal trading, which can hardly be coped effectively in the previous literatures.
We design a Q-decomposed reinforcement learning system to handle the tradeoff between expected loss and return. The framework consists of two subagents and one arbiter. Subagents learn the expected loss and the expected return individually, and the arbiter evaluates the sum of the return and loss of each action and takes the best one.
We perform two experiments: the grid world and Taiwanese Electronic Stock Index simulated trades. In the grid world, we evaluate the expected return and the expected loss of different level of risk-averse agents. We compare the risk-avoiding agent with the variance-penalized and risk sensitive agent in the stock trading experiment. The results show that our risk-avoiding agent can not only reduce the expected loss but also cut down other kinds of risks.
|
author2 |
Shou-de Lin |
author_facet |
Shou-de Lin Jung-Jung Yeh 葉蓉蓉 |
author |
Jung-Jung Yeh 葉蓉蓉 |
spellingShingle |
Jung-Jung Yeh 葉蓉蓉 Risk-Avoiding Reinforcement Learning |
author_sort |
Jung-Jung Yeh |
title |
Risk-Avoiding Reinforcement Learning |
title_short |
Risk-Avoiding Reinforcement Learning |
title_full |
Risk-Avoiding Reinforcement Learning |
title_fullStr |
Risk-Avoiding Reinforcement Learning |
title_full_unstemmed |
Risk-Avoiding Reinforcement Learning |
title_sort |
risk-avoiding reinforcement learning |
publishDate |
2011 |
url |
http://ndltd.ncl.edu.tw/handle/11854708998176094577 |
work_keys_str_mv |
AT jungjungyeh riskavoidingreinforcementlearning AT yèróngróng riskavoidingreinforcementlearning AT jungjungyeh kěbìmiǎnfēngxiǎnzhīqiánghuàxuéxíyǎnsuànfǎ AT yèróngróng kěbìmiǎnfēngxiǎnzhīqiánghuàxuéxíyǎnsuànfǎ |
_version_ |
1718092921519996928 |