Z. Allen-zhu and Y. Li, Backward feature correction: How deep learning performs deep learning, 2020.

Z. Allen-zhu, Y. Li, and Z. Song, A convergence theory for deep learning via over-parameterization, Proceedings of the International Conference on Machine Learning (ICML), 2019.

K. Atkinson and W. Han, Spherical harmonics and approximations on the unit sphere: an introduction, vol.2044, 2012.

D. Azevedo and V. A. Menegatto, Sharp estimates for eigenvalues of integral operators generated by dot product kernels on the sphere, Journal of Approximation Theory, vol.177, pp.57-68, 2014.

F. Bach, Sharp analysis of low-rank kernel matrix approximations, Conference on Learning Theory (COLT), 2013.
URL : https://hal.archives-ouvertes.fr/hal-00723365

F. Bach, Breaking the curse of dimensionality with convex neural networks, Journal of Machine Learning Research (JMLR), vol.18, issue.1, pp.629-681, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01098505

F. Bach, On the equivalence between kernel quadrature rules and random feature expansions, Journal of Machine Learning Research (JMLR), vol.18, issue.1, pp.714-751, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01118276

R. Basri, D. Jacobs, Y. Kasten, and S. Kritchman, The convergence rate of neural networks for learned functions of different frequencies, Advances in Neural Information Processing Systems (NeurIPS), 2019.

R. Basri, M. Galun, A. Geifman, D. Jacobs, Y. Kasten et al., Frequency bias in neural networks for input of non-uniform density, Proceedings of the International Conference on Machine Learning (ICML), p.2020

M. Belkin, S. Ma, and S. Mandal, To understand deep learning we need to understand kernel learning, Proceedings of the International Conference on Machine Learning (ICML), 2018.

A. Bietti and J. , Group invariance, stability to deformations, and complexity of deep convolutional representations, Journal of Machine Learning Research, vol.20, issue.25, pp.1-49, 2019.
URL : https://hal.archives-ouvertes.fr/hal-01536004

A. Bietti and J. , On the inductive bias of neural tangent kernels, Advances in Neural Information Processing Systems (NeurIPS), 2019.
URL : https://hal.archives-ouvertes.fr/hal-02144221

A. Caponnetto and E. Vito, Optimal rates for the regularized least-squares algorithm, Foundations of Computational Mathematics, vol.7, issue.3, pp.331-368, 2007.

L. Chen and S. Xu, Deep neural tangent kernel and laplace kernel have the same rkhs, 2020.

M. Chen, Y. Bai, J. D. Lee, T. Zhao, H. Wang et al., Towards understanding hierarchical learning: Benefits of neural representations, 2020.

L. Chizat and F. Bach, On the global convergence of gradient descent for over-parameterized models using optimal transport, Advances in Neural Information Processing Systems (NeurIPS), 2018.
URL : https://hal.archives-ouvertes.fr/hal-01798792

L. Chizat, E. Oyallon, and F. Bach, On lazy training in differentiable programming, Advances in Neural Information Processing Systems (NeurIPS), 2019.
URL : https://hal.archives-ouvertes.fr/hal-01945578

Y. Cho and L. K. Saul, Kernel methods for deep learning, Advances in Neural Information Processing Systems (NIPS), 2009.

F. Cucker and S. Smale, On the mathematical foundations of learning, Bulletin of the American mathematical society, vol.39, issue.1, pp.1-49, 2002.

A. Daniely, Depth separation for neural networks, Conference on Learning Theory (COLT, 2017.

A. Daniely, R. Frostig, and Y. Singer, Toward deeper understanding of neural networks: The power of initialization and a dual view on expressivity, Advances in Neural Information Processing Systems (NIPS), 2016.

S. S. Du, J. D. Lee, H. Li, L. Wang, and X. Zhai, Gradient descent finds global minima of deep neural networks, Proceedings of the International Conference on Machine Learning (ICML), 2019.

S. S. Du, X. Zhai, B. Poczos, and A. Singh, Gradient descent provably optimizes over-parameterized neural networks, Proceedings of the International Conference on Learning Representations (ICLR), 2019.

C. Efthimiou and C. Frye, Spherical harmonics in p dimensions, 2014.

N. E. Karoui, The spectrum of kernel random matrices, The Annals of Statistics, vol.38, issue.1, pp.1-50, 2010.

R. Eldan and O. Shamir, The power of depth for feedforward neural networks, Conference on Learning Theory (COLT), 2016.

A. Geifman, A. Yadav, Y. Kasten, M. Galun, D. Jacobs et al., On the similarity between the laplace and neural tangent kernels, 2020.

B. Ghorbani, S. Mei, T. Misiakiewicz, and A. Montanari, Linearized two-layers neural networks in high dimension, 2019.

K. He, X. Zhang, S. Ren, and J. Sun, Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.

K. Hornik, M. Stinchcombe, and H. White, Multilayer feedforward networks are universal approximators, Neural networks, vol.2, issue.5, pp.359-366, 1989.

M. Ismail, Classical and quantum orthogonal polynomials in one variable, vol.13, 2005.

A. Jacot, F. Gabriel, and C. Hongler, Neural tangent kernel: Convergence and generalization in neural networks, Advances in Neural Information Processing Systems (NIPS), 2018.
URL : https://hal.archives-ouvertes.fr/hal-01824549

J. Lee, Y. Bahri, R. Novak, S. S. Schoenholz, J. Pennington et al., Deep neural networks as gaussian processes, Proceedings of the International Conference on Learning Representations (ICLR), 2018.

Y. Li and Y. Liang, Learning overparameterized neural networks via stochastic gradient descent on structured data, Advances in Neural Information Processing Systems (NeurIPS), 2018.

T. Liang, A. Rakhlin, and X. Zhai, On the risk of minimum-norm interpolants and restricted lower isometry of kernels, Conference on Learning Theory (COLT, p.2020

A. Matthews, M. Rowland, J. Hron, R. E. Turner, and Z. Ghahramani, Gaussian process behaviour in wide deep neural networks, 2018.

S. Mei, A. Montanari, and P. Nguyen, A mean field view of the landscape of two-layer neural networks, Proceedings of the National Academy of Sciences, vol.115, issue.33, pp.7665-7671, 2018.

H. N. Mhaskar and T. Poggio, Deep vs. shallow networks: An approximation theory perspective, Analysis and Applications, vol.14, issue.06, pp.829-848, 2016.

M. Neal, Bayesian learning for neural networks, 1996.

A. Pinkus, Approximation theory of the mlp model in neural networks, Acta numerica, vol.8, pp.143-195, 1999.

A. Rahimi and B. Recht, Random features for large-scale kernel machines, Advances in Neural Information Processing Systems (NIPS), 2007.

A. Rudi and L. Rosasco, Generalization properties of learning with random features, Advances in Neural Information Processing Systems, pp.3215-3225, 2017.

M. Scetbon and Z. Harchaoui, Risk bounds for multi-layer perceptrons through spectra of integral operators, 2020.

J. Schmidt-hieber, Nonparametric regression using deep neural networks with relu activation function, Annals of Statistics, vol.48, issue.4, pp.1875-1897, 2020.

A. J. Smola, Z. L. Ovari, and R. C. Williamson, Regularization with dot-product kernels, Advances in Neural Information Processing Systems (NIPS), 2001.

M. Telgarsky, Benefits of depth in neural networks, Conference on Learning Theory (COLT), 2016.

D. Yarotsky, Error bounds for approximations with deep relu networks, Neural Networks, vol.94, pp.103-114, 2017.

D. Zou, Y. Cao, D. Zhou, and Q. Gu, Stochastic gradient descent optimizes over-parameterized deep relu networks, Machine Learning, 2019.