Critical point-finding methods reveal gradient-flat regions of deep network losses
Despite the fact that the loss functions of deep neural networks are highly nonconvex, gradient-based optimization algorithms converge to approximately the same performance from many random initial points. One thread of work has focused on explaining this phenomenon by numerically characterizing the...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MIT Press Journals
2021
|
Subjects: | |
Online Access: | View Fulltext in Publisher |