An Empirical Investigation of Early Stopping Optimizations in Proximal Policy Optimization
Code-level optimizations, which are low-level optimization techniques used in the implementation of algorithms, have generally been considered as tangential and often do not appear in published pseudo-code of Reinforcement Learning (RL) algorithms. However, recent studies suggest these optimizations...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2021-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9520424/ |
id |
doaj-bcdbb9c7e9f241aea5725e923918ef61 |
---|---|
record_format |
Article |
spelling |
doaj-bcdbb9c7e9f241aea5725e923918ef612021-08-30T23:00:40ZengIEEEIEEE Access2169-35362021-01-01911798111799210.1109/ACCESS.2021.31066629520424An Empirical Investigation of Early Stopping Optimizations in Proximal Policy OptimizationRousslan Fernand Julien Dossa0https://orcid.org/0000-0003-0572-692XShengyi Huang1Santiago Ontanon2Takashi Matsubara3https://orcid.org/0000-0003-0642-4800Graduate School of System Informatics, Kobe University, Hyogo, JapanCollege of Computing & Informatics, Drexel University, Philadelphia, PA, USACollege of Computing & Informatics, Drexel University, Philadelphia, PA, USAGraduate School of Engineering Science, Osaka University, Osaka, JapanCode-level optimizations, which are low-level optimization techniques used in the implementation of algorithms, have generally been considered as tangential and often do not appear in published pseudo-code of Reinforcement Learning (RL) algorithms. However, recent studies suggest these optimizations to be critical to the performance of algorithms such as Proximal Policy Optimization (PPO). In this paper, we investigate the effect of one such optimization known as “early stopping” implemented for PPO in the popular openai/spinningup library but not in openai/baselines. This optimization technique, which we refer to as KLE-Stop, can stop the policy update within an epoch if the mean Kullback-Leibler (KL) Divergence between the target policy and current policy becomes too high. More specifically, we conduct experiments to examine the empirical importance of KLE-Stop and its conservative variant KLE-Rollback when they are used in conjunction with other common code-level optimizations. The main findings of our experiments are 1) the performance of PPO is sensitive to the number of update iterations per epoch (<inline-formula> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula>), 2) Early stopping optimizations (KLE-Stop and KLE-Rollback) <italic>mitigate</italic> such sensitivity by dynamically adjusting the actual number of update iterations within an epoch, 3) Early stopping optimizations could serve as a convenient alternative to tuning on <inline-formula> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula>.https://ieeexplore.ieee.org/document/9520424/Artificial Intelligencedeep learningreinforcement learningproximal policy optimizationrobotics and automationrobot learning |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Rousslan Fernand Julien Dossa Shengyi Huang Santiago Ontanon Takashi Matsubara |
spellingShingle |
Rousslan Fernand Julien Dossa Shengyi Huang Santiago Ontanon Takashi Matsubara An Empirical Investigation of Early Stopping Optimizations in Proximal Policy Optimization IEEE Access Artificial Intelligence deep learning reinforcement learning proximal policy optimization robotics and automation robot learning |
author_facet |
Rousslan Fernand Julien Dossa Shengyi Huang Santiago Ontanon Takashi Matsubara |
author_sort |
Rousslan Fernand Julien Dossa |
title |
An Empirical Investigation of Early Stopping Optimizations in Proximal Policy Optimization |
title_short |
An Empirical Investigation of Early Stopping Optimizations in Proximal Policy Optimization |
title_full |
An Empirical Investigation of Early Stopping Optimizations in Proximal Policy Optimization |
title_fullStr |
An Empirical Investigation of Early Stopping Optimizations in Proximal Policy Optimization |
title_full_unstemmed |
An Empirical Investigation of Early Stopping Optimizations in Proximal Policy Optimization |
title_sort |
empirical investigation of early stopping optimizations in proximal policy optimization |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2021-01-01 |
description |
Code-level optimizations, which are low-level optimization techniques used in the implementation of algorithms, have generally been considered as tangential and often do not appear in published pseudo-code of Reinforcement Learning (RL) algorithms. However, recent studies suggest these optimizations to be critical to the performance of algorithms such as Proximal Policy Optimization (PPO). In this paper, we investigate the effect of one such optimization known as “early stopping” implemented for PPO in the popular openai/spinningup library but not in openai/baselines. This optimization technique, which we refer to as KLE-Stop, can stop the policy update within an epoch if the mean Kullback-Leibler (KL) Divergence between the target policy and current policy becomes too high. More specifically, we conduct experiments to examine the empirical importance of KLE-Stop and its conservative variant KLE-Rollback when they are used in conjunction with other common code-level optimizations. The main findings of our experiments are 1) the performance of PPO is sensitive to the number of update iterations per epoch (<inline-formula> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula>), 2) Early stopping optimizations (KLE-Stop and KLE-Rollback) <italic>mitigate</italic> such sensitivity by dynamically adjusting the actual number of update iterations within an epoch, 3) Early stopping optimizations could serve as a convenient alternative to tuning on <inline-formula> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula>. |
topic |
Artificial Intelligence deep learning reinforcement learning proximal policy optimization robotics and automation robot learning |
url |
https://ieeexplore.ieee.org/document/9520424/ |
work_keys_str_mv |
AT rousslanfernandjuliendossa anempiricalinvestigationofearlystoppingoptimizationsinproximalpolicyoptimization AT shengyihuang anempiricalinvestigationofearlystoppingoptimizationsinproximalpolicyoptimization AT santiagoontanon anempiricalinvestigationofearlystoppingoptimizationsinproximalpolicyoptimization AT takashimatsubara anempiricalinvestigationofearlystoppingoptimizationsinproximalpolicyoptimization AT rousslanfernandjuliendossa empiricalinvestigationofearlystoppingoptimizationsinproximalpolicyoptimization AT shengyihuang empiricalinvestigationofearlystoppingoptimizationsinproximalpolicyoptimization AT santiagoontanon empiricalinvestigationofearlystoppingoptimizationsinproximalpolicyoptimization AT takashimatsubara empiricalinvestigationofearlystoppingoptimizationsinproximalpolicyoptimization |
_version_ |
1721184939500634112 |