An Empirical Investigation of Early Stopping Optimizations in Proximal Policy Optimization

Code-level optimizations, which are low-level optimization techniques used in the implementation of algorithms, have generally been considered as tangential and often do not appear in published pseudo-code of Reinforcement Learning (RL) algorithms. However, recent studies suggest these optimizations...

Full description

Bibliographic Details
Main Authors:	Rousslan Fernand Julien Dossa, Shengyi Huang, Santiago Ontanon, Takashi Matsubara
Format:	Article
Language:	English
Published:	IEEE 2021-01-01
Series:	IEEE Access
Subjects:	Artificial Intelligence deep learning reinforcement learning proximal policy optimization robotics and automation robot learning
Online Access:	https://ieeexplore.ieee.org/document/9520424/

id	doaj-bcdbb9c7e9f241aea5725e923918ef61
record_format	Article
spelling	doaj-bcdbb9c7e9f241aea5725e923918ef612021-08-30T23:00:40ZengIEEEIEEE Access2169-35362021-01-01911798111799210.1109/ACCESS.2021.31066629520424An Empirical Investigation of Early Stopping Optimizations in Proximal Policy OptimizationRousslan Fernand Julien Dossa0https://orcid.org/0000-0003-0572-692XShengyi Huang1Santiago Ontanon2Takashi Matsubara3https://orcid.org/0000-0003-0642-4800Graduate School of System Informatics, Kobe University, Hyogo, JapanCollege of Computing & Informatics, Drexel University, Philadelphia, PA, USACollege of Computing & Informatics, Drexel University, Philadelphia, PA, USAGraduate School of Engineering Science, Osaka University, Osaka, JapanCode-level optimizations, which are low-level optimization techniques used in the implementation of algorithms, have generally been considered as tangential and often do not appear in published pseudo-code of Reinforcement Learning (RL) algorithms. However, recent studies suggest these optimizations to be critical to the performance of algorithms such as Proximal Policy Optimization (PPO). In this paper, we investigate the effect of one such optimization known as “early stopping” implemented for PPO in the popular openai/spinningup library but not in openai/baselines. This optimization technique, which we refer to as KLE-Stop, can stop the policy update within an epoch if the mean Kullback-Leibler (KL) Divergence between the target policy and current policy becomes too high. More specifically, we conduct experiments to examine the empirical importance of KLE-Stop and its conservative variant KLE-Rollback when they are used in conjunction with other common code-level optimizations. The main findings of our experiments are 1) the performance of PPO is sensitive to the number of update iterations per epoch (<inline-formula> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula>), 2) Early stopping optimizations (KLE-Stop and KLE-Rollback) <italic>mitigate</italic> such sensitivity by dynamically adjusting the actual number of update iterations within an epoch, 3) Early stopping optimizations could serve as a convenient alternative to tuning on <inline-formula> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula>.https://ieeexplore.ieee.org/document/9520424/Artificial Intelligencedeep learningreinforcement learningproximal policy optimizationrobotics and automationrobot learning
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Rousslan Fernand Julien Dossa Shengyi Huang Santiago Ontanon Takashi Matsubara
spellingShingle	Rousslan Fernand Julien Dossa Shengyi Huang Santiago Ontanon Takashi Matsubara An Empirical Investigation of Early Stopping Optimizations in Proximal Policy Optimization IEEE Access Artificial Intelligence deep learning reinforcement learning proximal policy optimization robotics and automation robot learning
author_facet	Rousslan Fernand Julien Dossa Shengyi Huang Santiago Ontanon Takashi Matsubara
author_sort	Rousslan Fernand Julien Dossa
title	An Empirical Investigation of Early Stopping Optimizations in Proximal Policy Optimization
title_short	An Empirical Investigation of Early Stopping Optimizations in Proximal Policy Optimization
title_full	An Empirical Investigation of Early Stopping Optimizations in Proximal Policy Optimization
title_fullStr	An Empirical Investigation of Early Stopping Optimizations in Proximal Policy Optimization
title_full_unstemmed	An Empirical Investigation of Early Stopping Optimizations in Proximal Policy Optimization
title_sort	empirical investigation of early stopping optimizations in proximal policy optimization
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2021-01-01
description	Code-level optimizations, which are low-level optimization techniques used in the implementation of algorithms, have generally been considered as tangential and often do not appear in published pseudo-code of Reinforcement Learning (RL) algorithms. However, recent studies suggest these optimizations to be critical to the performance of algorithms such as Proximal Policy Optimization (PPO). In this paper, we investigate the effect of one such optimization known as “early stopping” implemented for PPO in the popular openai/spinningup library but not in openai/baselines. This optimization technique, which we refer to as KLE-Stop, can stop the policy update within an epoch if the mean Kullback-Leibler (KL) Divergence between the target policy and current policy becomes too high. More specifically, we conduct experiments to examine the empirical importance of KLE-Stop and its conservative variant KLE-Rollback when they are used in conjunction with other common code-level optimizations. The main findings of our experiments are 1) the performance of PPO is sensitive to the number of update iterations per epoch (<inline-formula> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula>), 2) Early stopping optimizations (KLE-Stop and KLE-Rollback) <italic>mitigate</italic> such sensitivity by dynamically adjusting the actual number of update iterations within an epoch, 3) Early stopping optimizations could serve as a convenient alternative to tuning on <inline-formula> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula>.
topic	Artificial Intelligence deep learning reinforcement learning proximal policy optimization robotics and automation robot learning
url	https://ieeexplore.ieee.org/document/9520424/
work_keys_str_mv	AT rousslanfernandjuliendossa anempiricalinvestigationofearlystoppingoptimizationsinproximalpolicyoptimization AT shengyihuang anempiricalinvestigationofearlystoppingoptimizationsinproximalpolicyoptimization AT santiagoontanon anempiricalinvestigationofearlystoppingoptimizationsinproximalpolicyoptimization AT takashimatsubara anempiricalinvestigationofearlystoppingoptimizationsinproximalpolicyoptimization AT rousslanfernandjuliendossa empiricalinvestigationofearlystoppingoptimizationsinproximalpolicyoptimization AT shengyihuang empiricalinvestigationofearlystoppingoptimizationsinproximalpolicyoptimization AT santiagoontanon empiricalinvestigationofearlystoppingoptimizationsinproximalpolicyoptimization AT takashimatsubara empiricalinvestigationofearlystoppingoptimizationsinproximalpolicyoptimization
_version_	1721184939500634112

An Empirical Investigation of Early Stopping Optimizations in Proximal Policy Optimization

Similar Items