Optimal planning with approximate model-based reinforcement learning
Model-based reinforcement learning methods make efficient use of samples by building a model of the environment and planning with it. Compared to model-free methods, they usually take fewer samples to converge to the optimal policy. Despite that efficiency, model-based methods may not learn the op...
Main Author: | |
---|---|
Language: | English |
Published: |
University of British Columbia
2012
|
Online Access: | http://hdl.handle.net/2429/39889 |
id |
ndltd-LACETR-oai-collectionscanada.gc.ca-BVAU.2429-39889 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-LACETR-oai-collectionscanada.gc.ca-BVAU.2429-398892014-03-26T03:38:30Z Optimal planning with approximate model-based reinforcement learning Kao, Hai Feng Model-based reinforcement learning methods make efficient use of samples by building a model of the environment and planning with it. Compared to model-free methods, they usually take fewer samples to converge to the optimal policy. Despite that efficiency, model-based methods may not learn the optimal policy due to structural modeling assumptions. In this thesis, we show that by combining model- based methods with hierarchically optimal recursive Q-learning (HORDQ) under a hierarchical reinforcement learning framework, the proposed approach learns the optimal policy even when the assumptions of the model are not all satisfied. The effectiveness of our approach is demonstrated with the Bus domain and Infinite Mario – a Java implementation of Nintendo’s Super Mario Brothers. 2012-01-04T20:01:51Z 2012-01-04T20:01:51Z 2011 2012-01-04 2012-05 Electronic Thesis or Dissertation http://hdl.handle.net/2429/39889 eng http://creativecommons.org/licenses/by-sa/3.0/ Attribution-NonCommercial 2.5 Canada University of British Columbia |
collection |
NDLTD |
language |
English |
sources |
NDLTD |
description |
Model-based reinforcement learning methods make efficient use of samples by
building a model of the environment and planning with it. Compared to model-free
methods, they usually take fewer samples to converge to the optimal policy. Despite that efficiency, model-based methods may not learn the optimal policy due to
structural modeling assumptions. In this thesis, we show that by combining model-
based methods with hierarchically optimal recursive Q-learning (HORDQ) under
a hierarchical reinforcement learning framework, the proposed approach learns the
optimal policy even when the assumptions of the model are not all satisfied. The
effectiveness of our approach is demonstrated with the Bus domain and Infinite
Mario – a Java implementation of Nintendo’s Super Mario Brothers. |
author |
Kao, Hai Feng |
spellingShingle |
Kao, Hai Feng Optimal planning with approximate model-based reinforcement learning |
author_facet |
Kao, Hai Feng |
author_sort |
Kao, Hai Feng |
title |
Optimal planning with approximate model-based reinforcement learning |
title_short |
Optimal planning with approximate model-based reinforcement learning |
title_full |
Optimal planning with approximate model-based reinforcement learning |
title_fullStr |
Optimal planning with approximate model-based reinforcement learning |
title_full_unstemmed |
Optimal planning with approximate model-based reinforcement learning |
title_sort |
optimal planning with approximate model-based reinforcement learning |
publisher |
University of British Columbia |
publishDate |
2012 |
url |
http://hdl.handle.net/2429/39889 |
work_keys_str_mv |
AT kaohaifeng optimalplanningwithapproximatemodelbasedreinforcementlearning |
_version_ |
1716656183796826112 |