Optimization-based Approximate Dynamic Programming

Reinforcement learning algorithms hold promise in many complex domains, such as resource management and planning under uncertainty. Most reinforcement learning algorithms are iterative - they successively approximate the solution based on a set of samples and features. Although these iterative algor...

Full description

Bibliographic Details
Main Author: Petrik, Marek
Format: Others
Published: ScholarWorks@UMass Amherst 2010
Subjects:
Online Access:https://scholarworks.umass.edu/open_access_dissertations/308
https://scholarworks.umass.edu/cgi/viewcontent.cgi?article=1302&context=open_access_dissertations
id ndltd-UMASS-oai-scholarworks.umass.edu-open_access_dissertations-1302
record_format oai_dc
spelling ndltd-UMASS-oai-scholarworks.umass.edu-open_access_dissertations-13022020-12-02T14:38:33Z Optimization-based Approximate Dynamic Programming Petrik, Marek Reinforcement learning algorithms hold promise in many complex domains, such as resource management and planning under uncertainty. Most reinforcement learning algorithms are iterative - they successively approximate the solution based on a set of samples and features. Although these iterative algorithms can achieve impressive results in some domains, they are not sufficiently reliable for wide applicability; they often require extensive parameter tweaking to work well and provide only weak guarantees of solution quality. Some of the most interesting reinforcement learning algorithms are based on approximate dynamic programming (ADP). ADP, also known as value function approximation, approximates the value of being in each state. This thesis presents new reliable algorithms for ADP that use optimization instead of iterative improvement. Because these optimization-based algorithms explicitly seek solutions with favorable properties, they are easy to analyze, offer much stronger guarantees than iterative algorithms, and have few or no parameters to tweak. In particular, we improve on approximate linear programming - an existing method - and derive approximate bilinear programming - a new robust approximate method. The strong guarantees of optimization-based algorithms not only increase confidence in the solution quality, but also make it easier to combine the algorithms with other ADP components. The other components of ADP are samples and features used to approximate the value function. Relying on the simplified analysis of optimization-based methods, we derive new bounds on the error due to missing samples. These bounds are simpler, tighter, and more practical than the existing bounds for iterative algorithms and can be used to evaluate solution quality in practical settings. Finally, we propose homotopy methods that use the sampling bounds to automatically select good approximation features for optimization-based algorithms. Automatic feature selection significantly increases the flexibility and applicability of the proposed ADP methods. The methods presented in this thesis can potentially be used in many practical applications in artificial intelligence, operations research, and engineering. Our experimental results show that optimization-based methods may perform well on resource-management problems and standard benchmark problems and therefore represent an attractive alternative to traditional iterative methods. 2010-09-01T07:00:00Z text application/pdf https://scholarworks.umass.edu/open_access_dissertations/308 https://scholarworks.umass.edu/cgi/viewcontent.cgi?article=1302&context=open_access_dissertations Open Access Dissertations ScholarWorks@UMass Amherst Approximate Dynamic Programming Approximate Linear Programming Markov Decision Problem Mathematical Optimization Reinforcement Learning Computer Sciences
collection NDLTD
format Others
sources NDLTD
topic Approximate Dynamic Programming
Approximate Linear Programming
Markov Decision Problem
Mathematical Optimization
Reinforcement Learning
Computer Sciences
spellingShingle Approximate Dynamic Programming
Approximate Linear Programming
Markov Decision Problem
Mathematical Optimization
Reinforcement Learning
Computer Sciences
Petrik, Marek
Optimization-based Approximate Dynamic Programming
description Reinforcement learning algorithms hold promise in many complex domains, such as resource management and planning under uncertainty. Most reinforcement learning algorithms are iterative - they successively approximate the solution based on a set of samples and features. Although these iterative algorithms can achieve impressive results in some domains, they are not sufficiently reliable for wide applicability; they often require extensive parameter tweaking to work well and provide only weak guarantees of solution quality. Some of the most interesting reinforcement learning algorithms are based on approximate dynamic programming (ADP). ADP, also known as value function approximation, approximates the value of being in each state. This thesis presents new reliable algorithms for ADP that use optimization instead of iterative improvement. Because these optimization-based algorithms explicitly seek solutions with favorable properties, they are easy to analyze, offer much stronger guarantees than iterative algorithms, and have few or no parameters to tweak. In particular, we improve on approximate linear programming - an existing method - and derive approximate bilinear programming - a new robust approximate method. The strong guarantees of optimization-based algorithms not only increase confidence in the solution quality, but also make it easier to combine the algorithms with other ADP components. The other components of ADP are samples and features used to approximate the value function. Relying on the simplified analysis of optimization-based methods, we derive new bounds on the error due to missing samples. These bounds are simpler, tighter, and more practical than the existing bounds for iterative algorithms and can be used to evaluate solution quality in practical settings. Finally, we propose homotopy methods that use the sampling bounds to automatically select good approximation features for optimization-based algorithms. Automatic feature selection significantly increases the flexibility and applicability of the proposed ADP methods. The methods presented in this thesis can potentially be used in many practical applications in artificial intelligence, operations research, and engineering. Our experimental results show that optimization-based methods may perform well on resource-management problems and standard benchmark problems and therefore represent an attractive alternative to traditional iterative methods.
author Petrik, Marek
author_facet Petrik, Marek
author_sort Petrik, Marek
title Optimization-based Approximate Dynamic Programming
title_short Optimization-based Approximate Dynamic Programming
title_full Optimization-based Approximate Dynamic Programming
title_fullStr Optimization-based Approximate Dynamic Programming
title_full_unstemmed Optimization-based Approximate Dynamic Programming
title_sort optimization-based approximate dynamic programming
publisher ScholarWorks@UMass Amherst
publishDate 2010
url https://scholarworks.umass.edu/open_access_dissertations/308
https://scholarworks.umass.edu/cgi/viewcontent.cgi?article=1302&context=open_access_dissertations
work_keys_str_mv AT petrikmarek optimizationbasedapproximatedynamicprogramming
_version_ 1719365825553498112