A leader-follower partially observed Markov game

The intent of this dissertation is to generate a set of non-dominated finite-memory policies from which one of two agents (the leader) can select a most preferred policy to control a dynamic system that is also affected by the control decisions of the other agent (the follower). The problem is descr...

Full description

Bibliographic Details
Main Author:	Chang, Yanling
Other Authors:	Erera, Alan L.
Format:	Others
Published:	Georgia Institute of Technology 2016
Subjects:	Risk analysis Markov decision process Real-time decision making Value of information
Online Access:	http://hdl.handle.net/1853/54407

id	ndltd-GATECH-oai-smartech.gatech.edu-1853-54407
record_format	oai_dc
spelling	ndltd-GATECH-oai-smartech.gatech.edu-1853-544072016-02-17T03:34:40ZA leader-follower partially observed Markov gameChang, YanlingRisk analysisMarkov decision processReal-time decision makingValue of informationThe intent of this dissertation is to generate a set of non-dominated finite-memory policies from which one of two agents (the leader) can select a most preferred policy to control a dynamic system that is also affected by the control decisions of the other agent (the follower). The problem is described by an infinite horizon total discounted reward, partially observed Markov game (POMG). Each agent’s policy assumes that the agent knows its current and recent state values, its recent actions, and the current and recent possibly inaccurate observations of the other agent’s state. For each candidate finite-memory leader policy, we assume the follower, fully aware of the leader policy, determines a policy that optimizes the follower’s criterion. The leader-follower assumption allows the POMG to be transformed into a specially structured, partially observed Markov decision process that we use to determine the follower’s best response policy for a given leader policy. We then present a value determination procedure to evaluate the performance of the leader for a given leader policy, based on which non-dominated set of leader polices can be selected by existing heuristic approaches. We then analyze how the value of the leader’s criterion changes due to changes in the leader’s quality of observation of the follower. We give conditions that insure improved observation quality will improve the leader’s value function, assuming that changes in the observation quality do not cause the follower to change its policy. We show that discontinuities in the value of the leader’ criterion, as a function of observation quality, can occur when the change of observation quality is significant enough for the follower to change its policy. We present conditions that determine when a discontinuity may occur and conditions that guarantee a discontinuity will not degrade the leader’s performance. This framework has been used to develop a dynamic risk analysis approach for U.S. food supply chains and to compare and create supply chain designs and sequential control strategies for risk mitigation.Georgia Institute of TechnologyErera, Alan L.White III, Chelsea C.2016-01-07T17:36:06Z2016-01-07T17:36:06Z2015-122015-11-10December 20152016-01-07T17:36:06ZDissertationapplication/pdfhttp://hdl.handle.net/1853/54407
collection	NDLTD
format	Others
sources	NDLTD
topic	Risk analysis Markov decision process Real-time decision making Value of information
spellingShingle	Risk analysis Markov decision process Real-time decision making Value of information Chang, Yanling A leader-follower partially observed Markov game
description	The intent of this dissertation is to generate a set of non-dominated finite-memory policies from which one of two agents (the leader) can select a most preferred policy to control a dynamic system that is also affected by the control decisions of the other agent (the follower). The problem is described by an infinite horizon total discounted reward, partially observed Markov game (POMG). Each agent’s policy assumes that the agent knows its current and recent state values, its recent actions, and the current and recent possibly inaccurate observations of the other agent’s state. For each candidate finite-memory leader policy, we assume the follower, fully aware of the leader policy, determines a policy that optimizes the follower’s criterion. The leader-follower assumption allows the POMG to be transformed into a specially structured, partially observed Markov decision process that we use to determine the follower’s best response policy for a given leader policy. We then present a value determination procedure to evaluate the performance of the leader for a given leader policy, based on which non-dominated set of leader polices can be selected by existing heuristic approaches. We then analyze how the value of the leader’s criterion changes due to changes in the leader’s quality of observation of the follower. We give conditions that insure improved observation quality will improve the leader’s value function, assuming that changes in the observation quality do not cause the follower to change its policy. We show that discontinuities in the value of the leader’ criterion, as a function of observation quality, can occur when the change of observation quality is significant enough for the follower to change its policy. We present conditions that determine when a discontinuity may occur and conditions that guarantee a discontinuity will not degrade the leader’s performance. This framework has been used to develop a dynamic risk analysis approach for U.S. food supply chains and to compare and create supply chain designs and sequential control strategies for risk mitigation.
author2	Erera, Alan L.
author_facet	Erera, Alan L. Chang, Yanling
author	Chang, Yanling
author_sort	Chang, Yanling
title	A leader-follower partially observed Markov game
title_short	A leader-follower partially observed Markov game
title_full	A leader-follower partially observed Markov game
title_fullStr	A leader-follower partially observed Markov game
title_full_unstemmed	A leader-follower partially observed Markov game
title_sort	leader-follower partially observed markov game
publisher	Georgia Institute of Technology
publishDate	2016
url	http://hdl.handle.net/1853/54407
work_keys_str_mv	AT changyanling aleaderfollowerpartiallyobservedmarkovgame AT changyanling leaderfollowerpartiallyobservedmarkovgame
_version_	1718189642923114496

A leader-follower partially observed Markov game

Similar Items