An Empirical Investigation of Early Stopping Optimizations in Proximal Policy Optimization

Code-level optimizations, which are low-level optimization techniques used in the implementation of algorithms, have generally been considered as tangential and often do not appear in published pseudo-code of Reinforcement Learning (RL) algorithms. However, recent studies suggest these optimizations...

Full description

Bibliographic Details
Main Authors: Rousslan Fernand Julien Dossa, Shengyi Huang, Santiago Ontanon, Takashi Matsubara
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9520424/
id doaj-bcdbb9c7e9f241aea5725e923918ef61
record_format Article
spelling doaj-bcdbb9c7e9f241aea5725e923918ef612021-08-30T23:00:40ZengIEEEIEEE Access2169-35362021-01-01911798111799210.1109/ACCESS.2021.31066629520424An Empirical Investigation of Early Stopping Optimizations in Proximal Policy OptimizationRousslan Fernand Julien Dossa0https://orcid.org/0000-0003-0572-692XShengyi Huang1Santiago Ontanon2Takashi Matsubara3https://orcid.org/0000-0003-0642-4800Graduate School of System Informatics, Kobe University, Hyogo, JapanCollege of Computing & Informatics, Drexel University, Philadelphia, PA, USACollege of Computing & Informatics, Drexel University, Philadelphia, PA, USAGraduate School of Engineering Science, Osaka University, Osaka, JapanCode-level optimizations, which are low-level optimization techniques used in the implementation of algorithms, have generally been considered as tangential and often do not appear in published pseudo-code of Reinforcement Learning (RL) algorithms. However, recent studies suggest these optimizations to be critical to the performance of algorithms such as Proximal Policy Optimization (PPO). In this paper, we investigate the effect of one such optimization known as &#x201C;early stopping&#x201D; implemented for PPO in the popular openai/spinningup library but not in openai/baselines. This optimization technique, which we refer to as KLE-Stop, can stop the policy update within an epoch if the mean Kullback-Leibler (KL) Divergence between the target policy and current policy becomes too high. More specifically, we conduct experiments to examine the empirical importance of KLE-Stop and its conservative variant KLE-Rollback when they are used in conjunction with other common code-level optimizations. The main findings of our experiments are 1) the performance of PPO is sensitive to the number of update iterations per epoch (<inline-formula> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula>), 2) Early stopping optimizations (KLE-Stop and KLE-Rollback) <italic>mitigate</italic> such sensitivity by dynamically adjusting the actual number of update iterations within an epoch, 3) Early stopping optimizations could serve as a convenient alternative to tuning on <inline-formula> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula>.https://ieeexplore.ieee.org/document/9520424/Artificial Intelligencedeep learningreinforcement learningproximal policy optimizationrobotics and automationrobot learning
collection DOAJ
language English
format Article
sources DOAJ
author Rousslan Fernand Julien Dossa
Shengyi Huang
Santiago Ontanon
Takashi Matsubara
spellingShingle Rousslan Fernand Julien Dossa
Shengyi Huang
Santiago Ontanon
Takashi Matsubara
An Empirical Investigation of Early Stopping Optimizations in Proximal Policy Optimization
IEEE Access
Artificial Intelligence
deep learning
reinforcement learning
proximal policy optimization
robotics and automation
robot learning
author_facet Rousslan Fernand Julien Dossa
Shengyi Huang
Santiago Ontanon
Takashi Matsubara
author_sort Rousslan Fernand Julien Dossa
title An Empirical Investigation of Early Stopping Optimizations in Proximal Policy Optimization
title_short An Empirical Investigation of Early Stopping Optimizations in Proximal Policy Optimization
title_full An Empirical Investigation of Early Stopping Optimizations in Proximal Policy Optimization
title_fullStr An Empirical Investigation of Early Stopping Optimizations in Proximal Policy Optimization
title_full_unstemmed An Empirical Investigation of Early Stopping Optimizations in Proximal Policy Optimization
title_sort empirical investigation of early stopping optimizations in proximal policy optimization
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2021-01-01
description Code-level optimizations, which are low-level optimization techniques used in the implementation of algorithms, have generally been considered as tangential and often do not appear in published pseudo-code of Reinforcement Learning (RL) algorithms. However, recent studies suggest these optimizations to be critical to the performance of algorithms such as Proximal Policy Optimization (PPO). In this paper, we investigate the effect of one such optimization known as &#x201C;early stopping&#x201D; implemented for PPO in the popular openai/spinningup library but not in openai/baselines. This optimization technique, which we refer to as KLE-Stop, can stop the policy update within an epoch if the mean Kullback-Leibler (KL) Divergence between the target policy and current policy becomes too high. More specifically, we conduct experiments to examine the empirical importance of KLE-Stop and its conservative variant KLE-Rollback when they are used in conjunction with other common code-level optimizations. The main findings of our experiments are 1) the performance of PPO is sensitive to the number of update iterations per epoch (<inline-formula> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula>), 2) Early stopping optimizations (KLE-Stop and KLE-Rollback) <italic>mitigate</italic> such sensitivity by dynamically adjusting the actual number of update iterations within an epoch, 3) Early stopping optimizations could serve as a convenient alternative to tuning on <inline-formula> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula>.
topic Artificial Intelligence
deep learning
reinforcement learning
proximal policy optimization
robotics and automation
robot learning
url https://ieeexplore.ieee.org/document/9520424/
work_keys_str_mv AT rousslanfernandjuliendossa anempiricalinvestigationofearlystoppingoptimizationsinproximalpolicyoptimization
AT shengyihuang anempiricalinvestigationofearlystoppingoptimizationsinproximalpolicyoptimization
AT santiagoontanon anempiricalinvestigationofearlystoppingoptimizationsinproximalpolicyoptimization
AT takashimatsubara anempiricalinvestigationofearlystoppingoptimizationsinproximalpolicyoptimization
AT rousslanfernandjuliendossa empiricalinvestigationofearlystoppingoptimizationsinproximalpolicyoptimization
AT shengyihuang empiricalinvestigationofearlystoppingoptimizationsinproximalpolicyoptimization
AT santiagoontanon empiricalinvestigationofearlystoppingoptimizationsinproximalpolicyoptimization
AT takashimatsubara empiricalinvestigationofearlystoppingoptimizationsinproximalpolicyoptimization
_version_ 1721184939500634112