A Note on Lazy Training in Supervised Differentiable Programming

Lenaic Chizat 1 Francis Bach 2, 1
1 SIERRA - Statistical Machine Learning and Parsimony
DI-ENS - Département d'informatique de l'École normale supérieure, CNRS - Centre National de la Recherche Scientifique, Inria de Paris
Abstract : In a series of recent theoretical works, it has been shown that strongly over-parameterized neural networks trained with gradient-based methods could converge linearly to zero training loss, with their parameters hardly varying. In this note, our goal is to exhibit the simple structure that is behind these results. In a simplified setting, we prove that "lazy training" essentially solves a kernel regression. We also show that this behavior is not so much due to over-parameterization than to a choice of scaling, often implicit, that allows to linearize the model around its initialization. These theoretical results complemented with simple numerical experiments make it seem unlikely that "lazy training" is behind the many successes of neural networks in high dimensional tasks.
Document type :
Preprints, Working Papers, ...
Liste complète des métadonnées

https://hal.inria.fr/hal-01945578
Contributor : Lénaïc Chizat <>
Submitted on : Thursday, February 21, 2019 - 7:56:20 AM
Last modification on : Friday, February 22, 2019 - 1:27:25 AM

Files

chizatbach2018lazy.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01945578, version 3
  • ARXIV : 1812.07956

Collections

Citation

Lenaic Chizat, Francis Bach. A Note on Lazy Training in Supervised Differentiable Programming. 2019. 〈hal-01945578v3〉

Share

Metrics

Record views

99

Files downloads

172