Learning to Play Cooperative Games via Reinforcement Learning

<p> Being able to accomplish tasks with multiple learners through learning has long been a goal of the multiagent systems and machine learning communities. One of the main approaches people have taken is reinforcement learning, but due to certain conditions and restrictions, applying reinforce...

Full description

Bibliographic Details
Main Author:	Wei, Ermo
Language:	EN
Published:	George Mason University 2019
Subjects:	Artificial intelligence\|Computer science
Online Access:	http://pqdtopen.proquest.com/#viewpdf?dispub=13420351

id	ndltd-PROQUEST-oai-pqdtoai.proquest.com-13420351
record_format	oai_dc
spelling	ndltd-PROQUEST-oai-pqdtoai.proquest.com-134203512019-03-07T16:07:53Z Learning to Play Cooperative Games via Reinforcement Learning Wei, Ermo Artificial intelligence\|Computer science <p> Being able to accomplish tasks with multiple learners through learning has long been a goal of the multiagent systems and machine learning communities. One of the main approaches people have taken is reinforcement learning, but due to certain conditions and restrictions, applying reinforcement learning in a multiagent setting has not achieved the same level of success when compared to its single agent counterparts. </p><p> This thesis aims to make coordination better for agents in cooperative games by improving on reinforcement learning algorithms in several ways. I begin by examining certain pathologies that can lead to the failure of reinforcement learning in cooperative games, and in particular the pathology of <i> relative overgeneralization</i>. In relative overgeneralization, agents do not learn to optimally collaborate because during the learning process each agent instead converges to behaviors which are robust in conjunction with the other agent's exploratory (and thus random), rather than optimal, choices. One solution to this is so-called <i>lenient learning</i>, where agents are forgiving of the poor choices of their teammates early in the learning cycle. In the first part of the thesis, I develop a lenient learning method to deal with relative overgeneralization in independent learner settings with small stochastic games and discrete actions. </p><p> I then examine certain issues in a more complex multiagent domain involving parameterized action Markov decision processes, motivated by the RoboCup 2D simulation league. I propose two methods, one batch method and one actor-critic method, based on state of the art reinforcement learning algorithms, and show experimentally that the proposed algorithms can train the agents in a significantly more sample-efficient way than more common methods. </p><p> I then broaden the parameterized-action scenario to consider both repeated and stochastic games with continuous actions. I show how relative overgeneralization prevents the multiagent actor-critic model from learning optimal behaviors and demonstrate how to use Soft Q-Learning to solve this problem in repeated games. </p><p> Finally, I extend imitation learning to the multiagent setting to solve related issues in stochastic games, and prove that given the demonstration from an expert, multiagent Imitation Learning is exactly the multiagent actor-critic model in Maximum Entropy Reinforcement Learning framework. I further show that when demonstration samples meet certain conditions the relative overgeneralization problem can be avoided during the learning process.</p><p> George Mason University 2019-03-02 00:00:00.0 thesis http://pqdtopen.proquest.com/#viewpdf?dispub=13420351 EN
collection	NDLTD
language	EN
sources	NDLTD
topic	Artificial intelligence\|Computer science
spellingShingle	Artificial intelligence\|Computer science Wei, Ermo Learning to Play Cooperative Games via Reinforcement Learning
description	<p> Being able to accomplish tasks with multiple learners through learning has long been a goal of the multiagent systems and machine learning communities. One of the main approaches people have taken is reinforcement learning, but due to certain conditions and restrictions, applying reinforcement learning in a multiagent setting has not achieved the same level of success when compared to its single agent counterparts. </p><p> This thesis aims to make coordination better for agents in cooperative games by improving on reinforcement learning algorithms in several ways. I begin by examining certain pathologies that can lead to the failure of reinforcement learning in cooperative games, and in particular the pathology of <i> relative overgeneralization</i>. In relative overgeneralization, agents do not learn to optimally collaborate because during the learning process each agent instead converges to behaviors which are robust in conjunction with the other agent's exploratory (and thus random), rather than optimal, choices. One solution to this is so-called <i>lenient learning</i>, where agents are forgiving of the poor choices of their teammates early in the learning cycle. In the first part of the thesis, I develop a lenient learning method to deal with relative overgeneralization in independent learner settings with small stochastic games and discrete actions. </p><p> I then examine certain issues in a more complex multiagent domain involving parameterized action Markov decision processes, motivated by the RoboCup 2D simulation league. I propose two methods, one batch method and one actor-critic method, based on state of the art reinforcement learning algorithms, and show experimentally that the proposed algorithms can train the agents in a significantly more sample-efficient way than more common methods. </p><p> I then broaden the parameterized-action scenario to consider both repeated and stochastic games with continuous actions. I show how relative overgeneralization prevents the multiagent actor-critic model from learning optimal behaviors and demonstrate how to use Soft Q-Learning to solve this problem in repeated games. </p><p> Finally, I extend imitation learning to the multiagent setting to solve related issues in stochastic games, and prove that given the demonstration from an expert, multiagent Imitation Learning is exactly the multiagent actor-critic model in Maximum Entropy Reinforcement Learning framework. I further show that when demonstration samples meet certain conditions the relative overgeneralization problem can be avoided during the learning process.</p><p>
author	Wei, Ermo
author_facet	Wei, Ermo
author_sort	Wei, Ermo
title	Learning to Play Cooperative Games via Reinforcement Learning
title_short	Learning to Play Cooperative Games via Reinforcement Learning
title_full	Learning to Play Cooperative Games via Reinforcement Learning
title_fullStr	Learning to Play Cooperative Games via Reinforcement Learning
title_full_unstemmed	Learning to Play Cooperative Games via Reinforcement Learning
title_sort	learning to play cooperative games via reinforcement learning
publisher	George Mason University
publishDate	2019
url	http://pqdtopen.proquest.com/#viewpdf?dispub=13420351
work_keys_str_mv	AT weiermo learningtoplaycooperativegamesviareinforcementlearning
_version_	1719000357589221376

Learning to Play Cooperative Games via Reinforcement Learning

Similar Items