Policy Iteration for Continuous-Time Average Reward Markov Decision Processes in Polish Spaces
We study the policy iteration algorithm (PIA) for continuous-time jump Markov decision processes in general state and action spaces. The corresponding transition rates are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. The criterion that we are concerned with...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Hindawi Limited
2009-01-01
|
Series: | Abstract and Applied Analysis |
Online Access: | http://dx.doi.org/10.1155/2009/103723 |
id |
doaj-e8d529083c7943c5adc1318014d46c1e |
---|---|
record_format |
Article |
spelling |
doaj-e8d529083c7943c5adc1318014d46c1e2020-11-24T21:21:48ZengHindawi LimitedAbstract and Applied Analysis1085-33751687-04092009-01-01200910.1155/2009/103723103723Policy Iteration for Continuous-Time Average Reward Markov Decision Processes in Polish SpacesQuanxin Zhu0Xinsong Yang1Chuangxia Huang2Department of Mathematics, Ningbo University, Ningbo 315211, ChinaDepartment of Mathematics, Honghe University, Mengzi 661100, ChinaThe College of Mathematics and Computing Science, Changsha University of Science and Technology, Changsha 410076, ChinaWe study the policy iteration algorithm (PIA) for continuous-time jump Markov decision processes in general state and action spaces. The corresponding transition rates are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. The criterion that we are concerned with is expected average reward. We propose a set of conditions under which we first establish the average reward optimality equation and present the PIA. Then under two slightly different sets of conditions we show that the PIA yields the optimal (maximum) reward, an average optimal stationary policy, and a solution to the average reward optimality equation.http://dx.doi.org/10.1155/2009/103723 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Quanxin Zhu Xinsong Yang Chuangxia Huang |
spellingShingle |
Quanxin Zhu Xinsong Yang Chuangxia Huang Policy Iteration for Continuous-Time Average Reward Markov Decision Processes in Polish Spaces Abstract and Applied Analysis |
author_facet |
Quanxin Zhu Xinsong Yang Chuangxia Huang |
author_sort |
Quanxin Zhu |
title |
Policy Iteration for Continuous-Time Average Reward Markov Decision Processes in Polish Spaces |
title_short |
Policy Iteration for Continuous-Time Average Reward Markov Decision Processes in Polish Spaces |
title_full |
Policy Iteration for Continuous-Time Average Reward Markov Decision Processes in Polish Spaces |
title_fullStr |
Policy Iteration for Continuous-Time Average Reward Markov Decision Processes in Polish Spaces |
title_full_unstemmed |
Policy Iteration for Continuous-Time Average Reward Markov Decision Processes in Polish Spaces |
title_sort |
policy iteration for continuous-time average reward markov decision processes in polish spaces |
publisher |
Hindawi Limited |
series |
Abstract and Applied Analysis |
issn |
1085-3375 1687-0409 |
publishDate |
2009-01-01 |
description |
We study the policy iteration algorithm (PIA) for continuous-time jump Markov decision processes in general state and action spaces. The corresponding transition rates are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. The criterion that we are concerned with is expected average reward. We propose a set of conditions under which we first establish the average reward optimality equation and present the PIA. Then under two slightly different sets of conditions we show that the PIA yields the optimal (maximum) reward, an average optimal stationary policy, and a solution to the average reward optimality equation. |
url |
http://dx.doi.org/10.1155/2009/103723 |
work_keys_str_mv |
AT quanxinzhu policyiterationforcontinuoustimeaveragerewardmarkovdecisionprocessesinpolishspaces AT xinsongyang policyiterationforcontinuoustimeaveragerewardmarkovdecisionprocessesinpolishspaces AT chuangxiahuang policyiterationforcontinuoustimeaveragerewardmarkovdecisionprocessesinpolishspaces |
_version_ |
1725998151174717440 |