Learning to Play Cooperative Games via Reinforcement Learning

<p> Being able to accomplish tasks with multiple learners through learning has long been a goal of the multiagent systems and machine learning communities. One of the main approaches people have taken is reinforcement learning, but due to certain conditions and restrictions, applying reinforce...

Full description

Bibliographic Details
Main Author: Wei, Ermo
Language:EN
Published: George Mason University 2019
Subjects:
Online Access:http://pqdtopen.proquest.com/#viewpdf?dispub=13420351
id ndltd-PROQUEST-oai-pqdtoai.proquest.com-13420351
record_format oai_dc
spelling ndltd-PROQUEST-oai-pqdtoai.proquest.com-134203512019-03-07T16:07:53Z Learning to Play Cooperative Games via Reinforcement Learning Wei, Ermo Artificial intelligence|Computer science <p> Being able to accomplish tasks with multiple learners through learning has long been a goal of the multiagent systems and machine learning communities. One of the main approaches people have taken is reinforcement learning, but due to certain conditions and restrictions, applying reinforcement learning in a multiagent setting has not achieved the same level of success when compared to its single agent counterparts. </p><p> This thesis aims to make coordination better for agents in cooperative games by improving on reinforcement learning algorithms in several ways. I begin by examining certain pathologies that can lead to the failure of reinforcement learning in cooperative games, and in particular the pathology of <i> relative overgeneralization</i>. In relative overgeneralization, agents do not learn to optimally collaborate because during the learning process each agent instead converges to behaviors which are robust in conjunction with the other agent's exploratory (and thus random), rather than optimal, choices. One solution to this is so-called <i>lenient learning</i>, where agents are forgiving of the poor choices of their teammates early in the learning cycle. In the first part of the thesis, I develop a lenient learning method to deal with relative overgeneralization in independent learner settings with small stochastic games and discrete actions. </p><p> I then examine certain issues in a more complex multiagent domain involving parameterized action Markov decision processes, motivated by the RoboCup 2D simulation league. I propose two methods, one batch method and one actor-critic method, based on state of the art reinforcement learning algorithms, and show experimentally that the proposed algorithms can train the agents in a significantly more sample-efficient way than more common methods. </p><p> I then broaden the parameterized-action scenario to consider both repeated and stochastic games with continuous actions. I show how relative overgeneralization prevents the multiagent actor-critic model from learning optimal behaviors and demonstrate how to use Soft Q-Learning to solve this problem in repeated games. </p><p> Finally, I extend imitation learning to the multiagent setting to solve related issues in stochastic games, and prove that given the demonstration from an expert, multiagent Imitation Learning is exactly the multiagent actor-critic model in Maximum Entropy Reinforcement Learning framework. I further show that when demonstration samples meet certain conditions the relative overgeneralization problem can be avoided during the learning process.</p><p> George Mason University 2019-03-02 00:00:00.0 thesis http://pqdtopen.proquest.com/#viewpdf?dispub=13420351 EN
collection NDLTD
language EN
sources NDLTD
topic Artificial intelligence|Computer science
spellingShingle Artificial intelligence|Computer science
Wei, Ermo
Learning to Play Cooperative Games via Reinforcement Learning
description <p> Being able to accomplish tasks with multiple learners through learning has long been a goal of the multiagent systems and machine learning communities. One of the main approaches people have taken is reinforcement learning, but due to certain conditions and restrictions, applying reinforcement learning in a multiagent setting has not achieved the same level of success when compared to its single agent counterparts. </p><p> This thesis aims to make coordination better for agents in cooperative games by improving on reinforcement learning algorithms in several ways. I begin by examining certain pathologies that can lead to the failure of reinforcement learning in cooperative games, and in particular the pathology of <i> relative overgeneralization</i>. In relative overgeneralization, agents do not learn to optimally collaborate because during the learning process each agent instead converges to behaviors which are robust in conjunction with the other agent's exploratory (and thus random), rather than optimal, choices. One solution to this is so-called <i>lenient learning</i>, where agents are forgiving of the poor choices of their teammates early in the learning cycle. In the first part of the thesis, I develop a lenient learning method to deal with relative overgeneralization in independent learner settings with small stochastic games and discrete actions. </p><p> I then examine certain issues in a more complex multiagent domain involving parameterized action Markov decision processes, motivated by the RoboCup 2D simulation league. I propose two methods, one batch method and one actor-critic method, based on state of the art reinforcement learning algorithms, and show experimentally that the proposed algorithms can train the agents in a significantly more sample-efficient way than more common methods. </p><p> I then broaden the parameterized-action scenario to consider both repeated and stochastic games with continuous actions. I show how relative overgeneralization prevents the multiagent actor-critic model from learning optimal behaviors and demonstrate how to use Soft Q-Learning to solve this problem in repeated games. </p><p> Finally, I extend imitation learning to the multiagent setting to solve related issues in stochastic games, and prove that given the demonstration from an expert, multiagent Imitation Learning is exactly the multiagent actor-critic model in Maximum Entropy Reinforcement Learning framework. I further show that when demonstration samples meet certain conditions the relative overgeneralization problem can be avoided during the learning process.</p><p>
author Wei, Ermo
author_facet Wei, Ermo
author_sort Wei, Ermo
title Learning to Play Cooperative Games via Reinforcement Learning
title_short Learning to Play Cooperative Games via Reinforcement Learning
title_full Learning to Play Cooperative Games via Reinforcement Learning
title_fullStr Learning to Play Cooperative Games via Reinforcement Learning
title_full_unstemmed Learning to Play Cooperative Games via Reinforcement Learning
title_sort learning to play cooperative games via reinforcement learning
publisher George Mason University
publishDate 2019
url http://pqdtopen.proquest.com/#viewpdf?dispub=13420351
work_keys_str_mv AT weiermo learningtoplaycooperativegamesviareinforcementlearning
_version_ 1719000357589221376