R. Abraham, J. E. Marsden, and T. R. Manifolds, Tensor Analysis, and Applications, vol.75, 2012.

Z. Allen-zhu, L. Yuanzhi, and L. Yingyu, A convergence theory for deep learning via over-parameterization, 2018.

Z. Allen-zhu, L. Yuanzhi, and L. Yingyu, Learning and generalization in overparameterized neural networks, going beyond two layers, 2018.

S. Arora, S. S. Du, W. Hu, Z. Li, R. Salakhutdinov et al., On exact computation with an infinitely wide neural net, 2019.

S. Arora, S. S. Du, W. Hu, Z. Li, and R. Wang, Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks, 2019.

L. Bottou, F. E. Curtis, and J. Nocedal, Optimization methods for large-scale machine learning, SIAM Review, vol.60, issue.2, pp.223-311, 2018.
DOI : 10.1137/16m1080173
URL : http://arxiv.org/pdf/1606.04838

Y. Boutaib, On Lipschitz maps and their flows, 2015.

Y. Cao and Q. Gu, A generalization theory of gradient descent for learning over-parameterized deep relu networks, 2019.

L. Carratino, A. Rudi, and L. Rosasco, Learning with SGD and random features, Advances in Neural Information Processing Systems, pp.10192-10203, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01958906

L. Chizat and F. Bach, On the global convergence of gradient descent for overparameterized models using optimal transport, Advances in neural information processing systems, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01798792

Y. Yao, L. Rosasco, and A. Caponnetto, On early stopping in gradient descent learning, Constructive Approximation, vol.26, pp.289-315, 2007.
DOI : 10.1007/s00365-006-0663-2

G. Yehudai and O. Shamir, On the power and limitations of random features for understanding neural networks, 2019.

S. Zagoruyko and N. Komodakis, Wide residual networks, Proceedings of the British Machine Vision Conference (BMVC), vol.12, pp.87-88, 2016.
DOI : 10.5244/c.30.87
URL : https://hal.archives-ouvertes.fr/hal-01832503

C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, Understanding deep learning requires rethinking generalization, International Conference on Learning Representations, 2017.

H. Zhang, D. Yu, W. Chen, and T. Liu, Training over-parameterized deep resnet is almost as easy as training a two-layer network, 2019.

D. Zou, Y. Cao, D. Zhou, and Q. Gu, Stochastic gradient descent optimizes over-parameterized deep ReLU networks, 2018.