Optimal planning with approximate model-based reinforcement learning

Model-based reinforcement learning methods make efficient use of samples by building a model of the environment and planning with it. Compared to model-free methods, they usually take fewer samples to converge to the optimal policy. Despite that efficiency, model-based methods may not learn the op...

Full description

Bibliographic Details
Main Author:	Kao, Hai Feng
Language:	English
Published:	University of British Columbia 2012
Online Access:	http://hdl.handle.net/2429/39889

id	ndltd-LACETR-oai-collectionscanada.gc.ca-BVAU.2429-39889
record_format	oai_dc
spelling	ndltd-LACETR-oai-collectionscanada.gc.ca-BVAU.2429-398892014-03-26T03:38:30Z Optimal planning with approximate model-based reinforcement learning Kao, Hai Feng Model-based reinforcement learning methods make efficient use of samples by building a model of the environment and planning with it. Compared to model-free methods, they usually take fewer samples to converge to the optimal policy. Despite that efficiency, model-based methods may not learn the optimal policy due to structural modeling assumptions. In this thesis, we show that by combining model- based methods with hierarchically optimal recursive Q-learning (HORDQ) under a hierarchical reinforcement learning framework, the proposed approach learns the optimal policy even when the assumptions of the model are not all satisfied. The effectiveness of our approach is demonstrated with the Bus domain and Infinite Mario – a Java implementation of Nintendo’s Super Mario Brothers. 2012-01-04T20:01:51Z 2012-01-04T20:01:51Z 2011 2012-01-04 2012-05 Electronic Thesis or Dissertation http://hdl.handle.net/2429/39889 eng http://creativecommons.org/licenses/by-sa/3.0/ Attribution-NonCommercial 2.5 Canada University of British Columbia
collection	NDLTD
language	English
sources	NDLTD
description	Model-based reinforcement learning methods make efficient use of samples by building a model of the environment and planning with it. Compared to model-free methods, they usually take fewer samples to converge to the optimal policy. Despite that efficiency, model-based methods may not learn the optimal policy due to structural modeling assumptions. In this thesis, we show that by combining model- based methods with hierarchically optimal recursive Q-learning (HORDQ) under a hierarchical reinforcement learning framework, the proposed approach learns the optimal policy even when the assumptions of the model are not all satisfied. The effectiveness of our approach is demonstrated with the Bus domain and Infinite Mario – a Java implementation of Nintendo’s Super Mario Brothers.
author	Kao, Hai Feng
spellingShingle	Kao, Hai Feng Optimal planning with approximate model-based reinforcement learning
author_facet	Kao, Hai Feng
author_sort	Kao, Hai Feng
title	Optimal planning with approximate model-based reinforcement learning
title_short	Optimal planning with approximate model-based reinforcement learning
title_full	Optimal planning with approximate model-based reinforcement learning
title_fullStr	Optimal planning with approximate model-based reinforcement learning
title_full_unstemmed	Optimal planning with approximate model-based reinforcement learning
title_sort	optimal planning with approximate model-based reinforcement learning
publisher	University of British Columbia
publishDate	2012
url	http://hdl.handle.net/2429/39889
work_keys_str_mv	AT kaohaifeng optimalplanningwithapproximatemodelbasedreinforcementlearning
_version_	1716656183796826112

Optimal planning with approximate model-based reinforcement learning

Similar Items