Critical point-finding methods reveal gradient-flat regions of deep network losses

Despite the fact that the loss functions of deep neural networks are highly nonconvex, gradient-based optimization algorithms converge to approximately the same performance from many random initial points. One thread of work has focused on explaining this phenomenon by numerically characterizing the...

Full description

Bibliographic Details
Main Authors: Bouchard, K.E (Author), Deweese, M.R (Author), Frye, C.G (Author), Ligeralde, A. (Author), Simon, J. (Author), Wadia, N.S (Author)
Format: Article
Language:English
Published: MIT Press Journals 2021
Subjects:
Online Access:View Fulltext in Publisher