Computational methods for Markov decision problems
In this thesis we study computational methods for finite discounted Markov decision problems and finite discounted parametric Markov decision problems over an infinite horizon. For the former problem our emphasis is on finding methods to significantly reduce the effort required to determine an optim...
Main Author: | |
---|---|
Language: | English |
Published: |
2010
|
Online Access: | http://hdl.handle.net/2429/22353 |
Summary: | In this thesis we study computational methods for finite discounted Markov decision problems and finite discounted parametric Markov decision problems over an infinite horizon. For the former problem our emphasis is on finding methods to significantly reduce the effort required to determine an optimal policy. We discuss the implementation of Porteus' scalar extrapolation methods in the modified policy iteration algorithm and show that the results using only a final scalar extrapolation will be the same as those obtained by applying scalar extrapolation at each iteration and then using a final scalar extrapolation. Action elimination procedures for policy iteration and modified policy iteration algorithms are presented. The purpose of these techniques is to reduce the size of the action space to be searched in the improvement phase of the algorithm. A method for eliminating non-optimal actions for all subsequent iterations using upper and lower bounds on the optimal expected total discounted return is presented along with procedures for eliminating actions that cannot be part of the policy chosen in the improvement phase of the next iteration. A numerical comparison of these procedures on Howard's automobile replacement problem and on a large randomly generated problem suggests that using modified policy iteration together with one of the single iteration elimination procedures will lead to large savings in the computational time for problems with large state spaces. Modifications of the algorithm to reduce storage space are also discussed. For the finite discounted Markov decision problems in which the reward vector is parameterized by a scalar we present an algorithm to determine the optimal policy for each value of the parameter within an interval. The algorithm is based on using approximations of values to resolve difficulties caused by roundoff error. Also, several action elimination
procedures are presented for this problem. Bi-criterion Markov decision problems and Markov decision problems with a single constraint are formulated
as parametric Markov decision problems. A numerical comparison of algorithms with and without action elimination procedures is carried out on a two criterion version of Howard's automobile replacement problem. The results suggest that the algorithm with one of the action elimination procedures will lead to efficient solution of this problem. === Business, Sauder School of === Operations and Logistics (OPLOG), Division of === Graduate |
---|