Summary: | Starting from Fermat’s principle of least action, which governs classical and quantum mechanics and from
the theory of exterior differential forms, which governs the geometry of curved manifolds, we show how
to derive the equations governing neural networks in an intrinsic, coordinate-invariant way, where the loss
function plays the role of the Hamiltonian. To be covariant, these equations imply a layer metric which is
instrumental in pretraining and explains the role of conjugation when using complex numbers. The differential
formalism clarifies the relation of the gradient descent optimizer with Aristotelian and Newtonian
mechanics. The Bayesian paradigm is then analyzed as a renormalizable theory yielding a new derivation
of the Bayesian information criterion. We hope that this formal presentation of the differential geometry
of neural networks will encourage some physicists to dive into deep learning and, reciprocally, that the
specialists of deep learning will better appreciate the close interconnection of their subject with the foundations
of classical and quantum field theory.
|