Using Inverse Reinforcement Learning with Real Trajectories to Get More Trustworthy Pedestrian Simulations

Reinforcement learning is one of the most promising machine learning techniques to get intelligent behaviors for embodied agents in simulations. The output of the classic Temporal Difference family of Reinforcement Learning algorithms adopts the form of a value function expressed as a numeric table...

Full description

Bibliographic Details
Main Authors:	Francisco Martinez-Gil, Miguel Lozano, Ignacio García-Fernández, Pau Romero, Dolors Serra, Rafael Sebastián
Format:	Article
Language:	English
Published:	MDPI AG 2020-09-01
Series:	Mathematics
Subjects:	inverse reinforcement learning optimization causal entropy reinforcement learning learning by demonstration pedestrian simulation
Online Access:	https://www.mdpi.com/2227-7390/8/9/1479

id	doaj-75d2d27e05b04d70aa077f95e941c310
record_format	Article
spelling	doaj-75d2d27e05b04d70aa077f95e941c3102020-11-25T03:51:05ZengMDPI AGMathematics2227-73902020-09-0181479147910.3390/math8091479Using Inverse Reinforcement Learning with Real Trajectories to Get More Trustworthy Pedestrian SimulationsFrancisco Martinez-Gil0Miguel Lozano1Ignacio García-Fernández2Pau Romero3Dolors Serra4Rafael Sebastián5Computational Multiscale Simulation Lab (CoMMLab), Escola Tècnica Superior d’Enginyeria (ETSE-UV), Universitat de València, 46010 València, SpainComputational Multiscale Simulation Lab (CoMMLab), Escola Tècnica Superior d’Enginyeria (ETSE-UV), Universitat de València, 46010 València, SpainComputational Multiscale Simulation Lab (CoMMLab), Escola Tècnica Superior d’Enginyeria (ETSE-UV), Universitat de València, 46010 València, SpainComputational Multiscale Simulation Lab (CoMMLab), Escola Tècnica Superior d’Enginyeria (ETSE-UV), Universitat de València, 46010 València, SpainComputational Multiscale Simulation Lab (CoMMLab), Escola Tècnica Superior d’Enginyeria (ETSE-UV), Universitat de València, 46010 València, SpainComputational Multiscale Simulation Lab (CoMMLab), Escola Tècnica Superior d’Enginyeria (ETSE-UV), Universitat de València, 46010 València, SpainReinforcement learning is one of the most promising machine learning techniques to get intelligent behaviors for embodied agents in simulations. The output of the classic Temporal Difference family of Reinforcement Learning algorithms adopts the form of a value function expressed as a numeric table or a function approximator. The learned behavior is then derived using a greedy policy with respect to this value function. Nevertheless, sometimes the learned policy does not meet expectations, and the task of authoring is difficult and unsafe because the modification of one value or parameter in the learned value function has unpredictable consequences in the space of the policies it represents. This invalidates direct manipulation of the learned value function as a method to modify the derived behaviors. In this paper, we propose the use of Inverse Reinforcement Learning to incorporate real behavior traces in the learning process to shape the learned behaviors, thus increasing their trustworthiness (in terms of conformance to reality). To do so, we adapt the Inverse Reinforcement Learning framework to the navigation problem domain. Specifically, we use Soft Q-learning, an algorithm based on the maximum causal entropy principle, with MARL-Ped (a Reinforcement Learning-based pedestrian simulator) to include information from trajectories of real pedestrians in the process of learning how to navigate inside a virtual 3D space that represents the real environment. A comparison with the behaviors learned using a Reinforcement Learning classic algorithm (Sarsa(<inline-formula><math display="inline"><semantics><mi>λ</mi></semantics></math></inline-formula>)) shows that the Inverse Reinforcement Learning behaviors adjust significantly better to the real trajectories.https://www.mdpi.com/2227-7390/8/9/1479inverse reinforcement learningoptimizationcausal entropyreinforcement learninglearning by demonstrationpedestrian simulation
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Francisco Martinez-Gil Miguel Lozano Ignacio García-Fernández Pau Romero Dolors Serra Rafael Sebastián
spellingShingle	Francisco Martinez-Gil Miguel Lozano Ignacio García-Fernández Pau Romero Dolors Serra Rafael Sebastián Using Inverse Reinforcement Learning with Real Trajectories to Get More Trustworthy Pedestrian Simulations Mathematics inverse reinforcement learning optimization causal entropy reinforcement learning learning by demonstration pedestrian simulation
author_facet	Francisco Martinez-Gil Miguel Lozano Ignacio García-Fernández Pau Romero Dolors Serra Rafael Sebastián
author_sort	Francisco Martinez-Gil
title	Using Inverse Reinforcement Learning with Real Trajectories to Get More Trustworthy Pedestrian Simulations
title_short	Using Inverse Reinforcement Learning with Real Trajectories to Get More Trustworthy Pedestrian Simulations
title_full	Using Inverse Reinforcement Learning with Real Trajectories to Get More Trustworthy Pedestrian Simulations
title_fullStr	Using Inverse Reinforcement Learning with Real Trajectories to Get More Trustworthy Pedestrian Simulations
title_full_unstemmed	Using Inverse Reinforcement Learning with Real Trajectories to Get More Trustworthy Pedestrian Simulations
title_sort	using inverse reinforcement learning with real trajectories to get more trustworthy pedestrian simulations
publisher	MDPI AG
series	Mathematics
issn	2227-7390
publishDate	2020-09-01
description	Reinforcement learning is one of the most promising machine learning techniques to get intelligent behaviors for embodied agents in simulations. The output of the classic Temporal Difference family of Reinforcement Learning algorithms adopts the form of a value function expressed as a numeric table or a function approximator. The learned behavior is then derived using a greedy policy with respect to this value function. Nevertheless, sometimes the learned policy does not meet expectations, and the task of authoring is difficult and unsafe because the modification of one value or parameter in the learned value function has unpredictable consequences in the space of the policies it represents. This invalidates direct manipulation of the learned value function as a method to modify the derived behaviors. In this paper, we propose the use of Inverse Reinforcement Learning to incorporate real behavior traces in the learning process to shape the learned behaviors, thus increasing their trustworthiness (in terms of conformance to reality). To do so, we adapt the Inverse Reinforcement Learning framework to the navigation problem domain. Specifically, we use Soft Q-learning, an algorithm based on the maximum causal entropy principle, with MARL-Ped (a Reinforcement Learning-based pedestrian simulator) to include information from trajectories of real pedestrians in the process of learning how to navigate inside a virtual 3D space that represents the real environment. A comparison with the behaviors learned using a Reinforcement Learning classic algorithm (Sarsa(<inline-formula><math display="inline"><semantics><mi>λ</mi></semantics></math></inline-formula>)) shows that the Inverse Reinforcement Learning behaviors adjust significantly better to the real trajectories.
topic	inverse reinforcement learning optimization causal entropy reinforcement learning learning by demonstration pedestrian simulation
url	https://www.mdpi.com/2227-7390/8/9/1479
work_keys_str_mv	AT franciscomartinezgil usinginversereinforcementlearningwithrealtrajectoriestogetmoretrustworthypedestriansimulations AT miguellozano usinginversereinforcementlearningwithrealtrajectoriestogetmoretrustworthypedestriansimulations AT ignaciogarciafernandez usinginversereinforcementlearningwithrealtrajectoriestogetmoretrustworthypedestriansimulations AT pauromero usinginversereinforcementlearningwithrealtrajectoriestogetmoretrustworthypedestriansimulations AT dolorsserra usinginversereinforcementlearningwithrealtrajectoriestogetmoretrustworthypedestriansimulations AT rafaelsebastian usinginversereinforcementlearningwithrealtrajectoriestogetmoretrustworthypedestriansimulations
_version_	1724488962643329024

Using Inverse Reinforcement Learning with Real Trajectories to Get More Trustworthy Pedestrian Simulations

Similar Items