H. ,

N. Duffy and D. Helmbold, Boosting methods for regression, Machine Learning, vol.47, pp.153-200, 2002.

P. Bühlmann and B. Yu, Boosting with the L 2 loss: regression and classification, J. Amer. Statist. Assoc, vol.98, issue.462, pp.324-339, 2003.

M. Anthony and P. L. Bartlett, Neural network learning: theoretical foundations, 1999.

I. Goodfellow, Y. Bengio, and A. Courville, Deep learning, 2016.

S. Jastrzebski, Z. Kenton, N. Ballas, A. Fischer, Y. Bengio et al., On the relation between the sharpest directions of DNN loss and the SGD step length, 2018.

Y. Li and Y. Liang, Learning overparameterized neural networks via stochastic gradient descent on structured data, Advances in Neural Information Processing Systems, pp.8157-8166, 2018.

L. Györfi, M. Kohler, A. Krzy?ak, and H. Walk, A distribution-free theory of nonparametric regression, Springer Series in Statistics, 2002.

A. B. Tsybakov, Introduction to nonparametric estimation, Springer Series in Statistics, 2009.

E. Giné and R. Nickl, Mathematical foundations of infinitedimensional statistical models, 2016.

N. Aronszajn, Theory of reproducing kernels, Transactions of the American mathematical society, vol.68, issue.3, pp.337-404, 1950.

B. E. Boser, I. M. Guyon, and V. N. Vapnik, A training algorithm for optimal margin classifiers, Proceedings of the Fifth Annual Workshop on Computational Learning Theory, COLT '92, pp.144-152, 1992.

F. Cucker and S. Smale, On the mathematical foundations of learning, Bull. Amer. Math. Soc. (N.S.), vol.39, issue.1, pp.1-49, 2002.

I. Steinwart and A. Christmann, Support vector machines. Information Science and Statistics, 2008.

A. Caponnetto and E. Vito, Optimal rates for the regularized leastsquares algorithm, Found. Comput. Math, vol.7, issue.3, pp.331-368, 2007.

A. Jacot, F. Gabriel, and C. Hongler, Neural tangent kernel: Convergence and generalization in neural networks, Advances in neural information processing systems, pp.8571-8580, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01824549

B. Scholkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, 2001.

J. Shawe, -. Taylor, and N. Cristianini, Kernel Methods for Pattern Analysis, 2004.

F. Bauer, S. Pereverzev, and L. Rosasco, On regularization algorithms in learning theory, J. Complexity, vol.23, issue.1, pp.52-72, 2007.

G. Blanchard and N. Mücke, Optimal rates for regularization of statistical inverse learning problems, Found. Comput. Math, vol.18, issue.4, pp.971-1013, 2018.

J. Lin, A. Rudi, L. Rosasco, and V. Cevher, Optimal rates for spectral algorithms with least-squares regression over Hilbert spaces, Appl. Comput. Harmon. Anal, vol.48, issue.3, pp.868-890, 2020.
URL : https://hal.archives-ouvertes.fr/hal-01958890

G. Raskutti, M. J. Wainwright, and B. Yu, Early stopping and non-parametric regression: an optimal data-dependent stopping rule, J. Mach. Learn. Res, vol.15, pp.335-366, 2014.

W. Aad, J. A. Van-der-vaart, and . Wellner, Weak convergence and empirical processes, 1996.

S. Boyd and L. Vandenberghe, Convex optimization, 2004.

E. Vito, S. Pereverzyev, and L. Rosasco, Adaptive kernel methods using the balancing principle, Found. Comput. Math, vol.10, issue.4, pp.455-479, 2010.

G. Blanchard, P. Mathé, and N. Mücke, Lepskii principle in supervised learning, 2019.

L. Prechelt, Early stopping-but when?, Neural Networks: Tricks of the trade, pp.55-69, 1998.

T. Zhang and B. Yu, Boosting with early stopping: Convergence and consistency, Ann. Statist, vol.33, issue.4, pp.1538-1579, 2005.

Y. Yao, L. Rosasco, and A. Caponnetto, On early stopping in gradient descent learning, Constr. Approx, vol.26, issue.2, pp.289-315, 2007.

Y. Wei, F. Yang, and M. J. Wainwright, Early stopping for kernel boosting algorithms: a general analysis with localized complexities, IEEE Trans. Inform. Theory, vol.65, issue.10, pp.6685-6703, 2019.

G. Blanchard, M. Hoffmann, and M. Reiß, Early stopping for statistical inverse problems via truncated SVD estimation, Electron. J. Stat, vol.12, issue.2, pp.3204-3231, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01966326

G. Blanchard, M. Hoffmann, and M. Reiß, Optimal adaptation for early stopping in statistical inverse problems, SIAM/ASA J. Uncertain. Quantif, vol.6, issue.3, pp.1043-1075, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01426253

G. Blanchard, P. Mathé, and N. Mücke, Lepskii principle in supervised learning, 2019.

R. Vershynin, High-dimensional probability, 2018.

E. Vito, L. Rosasco, A. Caponnetto, U. D. Giovannini, and F. Odone, Learning from examples as an inverse problem, J. Mach. Learn. Res, vol.6, pp.883-904, 2005.

S. Lu and S. V. Pereverzev, Regularization theory for ill-posed problems, 2013.

S. Smale and D. Zhou, Learning theory estimates via integral operators and their approximations, Constr. Approx, vol.26, issue.2, pp.153-172, 2007.

T. Zhang, Effective dimension and generalization of kernel learning, Advances in Neural Information Processing Systems, pp.471-478, 2003.

S. Smale and D. Zhou, Shannon sampling. II. Connections to learning theory, Appl. Comput. Harmon. Anal, vol.19, issue.3, pp.285-302, 2005.

G. Blanchard and N. Krämer, Convergence rates of kernel conjugate gradient for random design regression, Anal. Appl. (Singap.), vol.14, issue.6, pp.763-794, 2016.

L. Pillaud-vivien, A. Rudi, and F. Bach, Statistical optimality of stochastic gradient descent on hard learning problems through multiple passes, Advances in Neural Information Processing Systems, pp.8114-8124, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01799116

S. Fischer and I. Steinwart, Sobolev norm learning rates for regularized least-squares algorithms, 2019.

S. Page and S. Grünewälder, The goldenshluger-lepski method for constrained least-squares estimators over rkhss, 2018.

E. Brunel, A. Mas, and A. Roche, Non-asymptotic adaptive prediction in functional linear models, J. Multivariate Anal, vol.143, pp.208-232, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01817385

W. Heinz, M. Engl, A. Hanke, and . Neubauer, Regularization of inverse problems, 1996.

G. Blanchard and P. Mathé, Discrepancy principle for statistical inverse problems with application to conjugate gradient iteration. Inverse Problems, vol.28, p.23, 2012.

L. Peter, . Bartlett, M. Philip, G. Long, A. Lugosi et al., Benign overfitting in linear regression, 2019.

B. Stankewitz, Smoothed residual stopping for statistical inverse problems via truncated SVD estimation, 2019.

L. Peter, O. Bartlett, S. Bousquet, and . Mendelson, Local Rademacher complexities. Ann. Statist, vol.33, issue.4, pp.1497-1537, 2005.

V. Koltchinskii, Local Rademacher complexities and oracle inequalities in risk minimization, Ann. Statist, vol.34, issue.6, pp.2593-2656, 2006.

V. Koltchinskii, Oracle inequalities in empirical risk minimization and sparse recovery problems, Lecture Notes in Mathematics, vol.2033, 2011.

J. Tropp, An introduction to matrix concentration inequalities, 2015.

S. Minsker, On some extensions of Bernstein's inequality for selfadjoint operators, Statist. Probab. Lett, vol.127, pp.111-119, 2017.

H. Lee, D. P. Dicker, D. Foster, and . Hsu, Kernel ridge vs. principal component regression: minimax bounds and the qualification of regularization operators, Electron. J. Stat, vol.11, issue.1, pp.1022-1047, 2017.

S. Boucheron, G. Lugosi, and P. Massart, A nonasymptotic theory of independence, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00794821

A. Caponnetto, Optimal rates for regularization operators in learning theory, 2006.

R. Bhatia, Matrix analysis, 1997.

C. Milbradt and M. Wahl, High-probability bounds for the reconstruction error of PCA, Statist. Probab. Lett, vol.161, p.108741, 2020.