Mutual Reinforcement Learning

Indiana University-Purdue University Indianapolis (IUPUI) === Mutual learning is an emerging field in intelligent systems which takes inspiration from naturally intelligent agents and attempts to explore how agents can communicate and coop- erate to share information and learn more quickly. While...

Full description

Bibliographic Details
Main Author: Reid, Cameron
Other Authors: Mukhopadhyay, Snehasis
Language:en_US
Published: 2021
Subjects:
Online Access:http://hdl.handle.net/1805/25957
http://dx.doi.org/10.7912/C2/11
Description
Summary:Indiana University-Purdue University Indianapolis (IUPUI) === Mutual learning is an emerging field in intelligent systems which takes inspiration from naturally intelligent agents and attempts to explore how agents can communicate and coop- erate to share information and learn more quickly. While agents in many biological systems have little trouble learning from one another, it is not immediately obvious how artificial agents would achieve similar learning. In this thesis, I explore how agents learn to interact with complex systems. I further explore how these complex learning agents may be able to transfer knowledge to one another to improve their learning performance when they are learning together and have the power of communication. While significant research has been done to explore the problem of knowledge transfer, the existing literature is concerned ei- ther with supervised learning tasks or relatively simple discrete reinforcement learning. The work presented here is, to my knowledge, the first which admits continuous state spaces and deep reinforcement learning techniques. The first contribution of this thesis, presented in Chapter 2, is a modified version of deep Q-learning which demonstrates improved learning performance due to the addition of a mutual learning term which penalizes disagreement between mutually learning agents. The second contribution, in Chapter 3, is a presentation work which describes effective communication of agents which use fundamentally different knowledge representations and systems of learning (model-free deep Q learning and model- based adaptive dynamic programming), and I discuss how the agents can mathematically negotiate their trust in one another to achieve superior learning performance. I conclude with a discussion of the promise shown by this area of research and a discussion of problems which I believe are exciting directions for future research.