Multilevel minimization for deep residual networks

We present a new multilevel minimization framework for the training of deep residual networks (ResNets), which has the potential to significantly reduce training time and effort. Our framework is based on the dynamical system’s viewpoint, which formulates a ResNet as the discretization of an initial...

Full description

Bibliographic Details
Main Authors:	Gaedke-Merzhäuser Lisa, Kopaničáková Alena, Krause Rolf
Format:	Article
Language:	English
Published:	EDP Sciences 2021-08-01
Series:	ESAIM: Proceedings and Surveys
Online Access:	https://www.esaim-proc.org/articles/proc/pdf/2021/02/proc2107112.pdf

id	doaj-e3e6087ccb534e058aa6b81de7697db3
record_format	Article
spelling	doaj-e3e6087ccb534e058aa6b81de7697db32021-09-02T09:29:22ZengEDP SciencesESAIM: Proceedings and Surveys2267-30592021-08-017113114410.1051/proc/202171131proc2107112Multilevel minimization for deep residual networksGaedke-Merzhäuser Lisa0Kopaničáková Alena1Krause Rolf2Institute of Computational Science, Università della Svizzera, italianaInstitute of Computational Science, Università della Svizzera, italianaInstitute of Computational Science, Università della Svizzera, italianaWe present a new multilevel minimization framework for the training of deep residual networks (ResNets), which has the potential to significantly reduce training time and effort. Our framework is based on the dynamical system’s viewpoint, which formulates a ResNet as the discretization of an initial value problem. The training process is then formulated as a time-dependent optimal control problem, which we discretize using different time-discretization parameters, eventually generating multilevel-hierarchy of auxiliary networks with different resolutions. The training of the original ResNet is then enhanced by training the auxiliary networks with reduced resolutions. By design, our framework is conveniently independent of the choice of the training strategy chosen on each level of the multilevel hierarchy. By means of numerical examples, we analyze the convergence behavior of the proposed method and demonstrate its robustness. For our examples we employ a multilevel gradient-based methods. Comparisons with standard single level methods show a speedup of more than factor three while achieving the same validation accuracy.https://www.esaim-proc.org/articles/proc/pdf/2021/02/proc2107112.pdf
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Gaedke-Merzhäuser Lisa Kopaničáková Alena Krause Rolf
spellingShingle	Gaedke-Merzhäuser Lisa Kopaničáková Alena Krause Rolf Multilevel minimization for deep residual networks ESAIM: Proceedings and Surveys
author_facet	Gaedke-Merzhäuser Lisa Kopaničáková Alena Krause Rolf
author_sort	Gaedke-Merzhäuser Lisa
title	Multilevel minimization for deep residual networks
title_short	Multilevel minimization for deep residual networks
title_full	Multilevel minimization for deep residual networks
title_fullStr	Multilevel minimization for deep residual networks
title_full_unstemmed	Multilevel minimization for deep residual networks
title_sort	multilevel minimization for deep residual networks
publisher	EDP Sciences
series	ESAIM: Proceedings and Surveys
issn	2267-3059
publishDate	2021-08-01
description	We present a new multilevel minimization framework for the training of deep residual networks (ResNets), which has the potential to significantly reduce training time and effort. Our framework is based on the dynamical system’s viewpoint, which formulates a ResNet as the discretization of an initial value problem. The training process is then formulated as a time-dependent optimal control problem, which we discretize using different time-discretization parameters, eventually generating multilevel-hierarchy of auxiliary networks with different resolutions. The training of the original ResNet is then enhanced by training the auxiliary networks with reduced resolutions. By design, our framework is conveniently independent of the choice of the training strategy chosen on each level of the multilevel hierarchy. By means of numerical examples, we analyze the convergence behavior of the proposed method and demonstrate its robustness. For our examples we employ a multilevel gradient-based methods. Comparisons with standard single level methods show a speedup of more than factor three while achieving the same validation accuracy.
url	https://www.esaim-proc.org/articles/proc/pdf/2021/02/proc2107112.pdf
work_keys_str_mv	AT gaedkemerzhauserlisa multilevelminimizationfordeepresidualnetworks AT kopanicakovaalena multilevelminimizationfordeepresidualnetworks AT krauserolf multilevelminimizationfordeepresidualnetworks
_version_	1721177148950052864

Multilevel minimization for deep residual networks

Similar Items