Critical point-finding methods reveal gradient-flat regions of deep network losses

Despite the fact that the loss functions of deep neural networks are highly nonconvex, gradient-based optimization algorithms converge to approximately the same performance from many random initial points. One thread of work has focused on explaining this phenomenon by numerically characterizing the...

Full description

Bibliographic Details
Main Authors:	Bouchard, K.E (Author), Deweese, M.R (Author), Frye, C.G (Author), Ligeralde, A. (Author), Simon, J. (Author), Wadia, N.S (Author)
Format:	Article
Language:	English
Published:	MIT Press Journals 2021
Subjects:	algorithm Algorithms Deep neural networks Gradient based optimization algorithms Initial point Local curvature Local minima problems Local minimums Loss functions Neural networks Neural Networks, Computer Second-order methods Stationary points
Online Access:	View Fulltext in Publisher

Internet

View Fulltext in Publisher

Critical point-finding methods reveal gradient-flat regions of deep network losses

Internet

Similar Items