, SGD with Variance Reduction beyond Empirical Risk Minimization, 2015.

Predicting the sequence specificities of dna-and rna-binding proteins by deep learning, Nature biotechnology, vol.33, issue.8, p.831, 2015. ,

Towards a coherent statistical framework for dense deformable template estimation, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.69, issue.1, pp.3-29, 2007. ,

Katyusha: The first direct acceleration of stochastic gradient methods, Journal of Machine Learning Research (JMLR), vol.18, issue.1, pp.8194-8244, 2017. ,

What can resnet learn efficiently, going beyond kernels?, Advances in Neural Information Processing Systems (NeurIPS), 2019. ,

Exploiting the Structure: Stochastic Gradient Methods Using Raw Clusters, Advances in Neural Information Processing Systems (NIPS), 2016. ,

Learning and generalization in overparameterized neural networks, going beyond two layers, Advances in Neural Information Processing Systems (NeurIPS), 2019. ,

A convergence theory for deep learning via overparameterization, Proceedings of the International Conference on Machine Learning (ICML), 2019. ,

Gapped blast and psi-blast: a new generation of protein database search programs, Nucleic acids research, vol.25, issue.17, pp.3389-3402, 1997. ,

Deep scattering spectrum, IEEE Transactions on Signal Processing, vol.62, issue.16, pp.4114-4128, 2014. ,

Deep convolutional networks are hierarchical kernel machines, 2015. ,

On invariance and selectivity in representation learning, Information and Inference, vol.5, issue.2, pp.134-158, 2016. ,

Neural network learning: Theoretical foundations, 2009. ,

On gradient regularizers for MMD GANs, Advances in Neural Information Processing Systems (NeurIPS), 2018. ,

, Proceedings of the International Conference on Machine Learning (ICML), 2017.

Theory of reproducing kernels, Transactions of the American mathematical society, vol.68, issue.3, pp.337-404, 1950. ,

Stronger generalization bounds for deep nets via a compression approach, Proceedings of the International Conference on Machine Learning (ICML), 2018. ,

On exact computation with an infinitely wide neural net, Advances in Neural Information Processing Systems (NeurIPS), 2019. ,

Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks, Proceedings of the International Conference on Machine Learning (ICML), 2019. ,

Spherical harmonics and approximations on the unit sphere: an introduction, vol.2044, 2012. ,

Sharp analysis of low-rank kernel matrix approximations, Conference on Learning Theory (COLT), 2013. ,

URL : https://hal.archives-ouvertes.fr/hal-00723365

Breaking the curse of dimensionality with convex neural networks, Journal of Machine Learning Research (JMLR), vol.18, issue.19, pp.1-53, 2017. ,

URL : https://hal.archives-ouvertes.fr/hal-01098505

On the equivalence between kernel quadrature rules and random feature expansions, Journal of Machine Learning Research (JMLR), vol.18, issue.21, pp.1-38, 2017. ,

URL : https://hal.archives-ouvertes.fr/hal-01118276

Kernel independent component analysis, Journal of Machine Learning Research (JMLR), vol.3, pp.1-48, 2002. ,

Predictive low-rank decomposition for kernel methods, Proceedings of the International Conference on Machine Learning (ICML), 2005. ,

Non-asymptotic analysis of stochastic approximation algorithms for machine learning, Advances in Neural Information Processing Systems (NIPS), 2011. ,

URL : https://hal.archives-ouvertes.fr/hal-00608041

Non-strongly-convex smooth stochastic approximation with convergence rate o (1/n), Advances in Neural Information Processing Systems (NIPS), 2013. ,

URL : https://hal.archives-ouvertes.fr/hal-00831977

Rademacher and gaussian complexities: Risk bounds and structural results, Journal of Machine Learning Research, vol.3, pp.463-482, 2002. ,

Local rademacher complexities, The Annals of Statistics, vol.33, issue.4, pp.1497-1537, 2005. ,

Convexity, classification, and risk bounds, Journal of the American Statistical Association, vol.101, issue.473, pp.138-156, 2006. ,

Spectrally-normalized margin bounds for neural networks, Advances in Neural Information Processing Systems (NIPS), 2017. ,

Benign overfitting in linear regression, 2019. ,

The convergence rate of neural networks for learned functions of different frequencies, Advances in Neural Information Processing Systems (NeurIPS), 2019. ,

Overfitting or perfect fitting? risk bounds for classification and regression rules that interpolate, Advances in Neural Information Processing Systems (NeurIPS), 2018. ,

To understand deep learning we need to understand kernel learning, Proceedings of the International Conference on Machine Learning (ICML), 2018. ,

Does data interpolation contradict statistical optimality?, Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2019. ,

Convex neural networks, Advances in Neural Information Processing Systems (NIPS), 2006. ,

Reproducing kernel Hilbert spaces in probability and statistics, 2004. ,

Invariance and stability of deep convolutional representations, Advances in Neural Information Processing Systems (NIPS), 2017. ,

URL : https://hal.archives-ouvertes.fr/hal-01630265

Stochastic optimization with variance reduction for infinite datasets with finite sum structure, Advances in Neural Information Processing Systems (NIPS), 2017. ,

URL : https://hal.archives-ouvertes.fr/hal-01375816

Group invariance, stability to deformations, and complexity of deep convolutional representations, Journal of Machine Learning Research, vol.20, issue.25, pp.1-49, 2019. ,

URL : https://hal.archives-ouvertes.fr/hal-01536004

On the inductive bias of neural tangent kernels, Advances in Neural Information Processing Systems (NeurIPS), 2019. ,

URL : https://hal.archives-ouvertes.fr/hal-02144221

, A contextual bandit bake-off, 2018.

URL : https://hal.archives-ouvertes.fr/hal-01708310

A kernel perspective for regularizing deep neural networks, Proceedings of the International Conference on Machine Learning (ICML), 2019. ,

URL : https://hal.archives-ouvertes.fr/hal-01884632

Wild patterns: Ten years after the rise of adversarial machine learning, Pattern Recognition, vol.84, pp.317-331, 2018. ,

, Proceedings of the International Conference on Learning Representations (ICLR), 2018.

Kernel descriptors for visual recognition, Advances in Neural Information Processing Systems (NIPS), 2010. ,

Object recognition with hierarchical kernel descriptors, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011. ,

The Tradeoffs of Large Scale Learning, Advances in Neural Information Processing Systems (NIPS), 2008. ,

Optimization methods for large-scale machine learning, Siam Review, vol.60, issue.2, pp.223-311, 2018. ,

Theory of classification: A survey of some recent advances, ESAIM: probability and statistics, vol.9, pp.323-375, 2005. ,

URL : https://hal.archives-ouvertes.fr/hal-00017923

On invariance in hierarchical models, Advances in Neural Information Processing Systems (NIPS), 2009. ,

Geometric deep learning: going beyond euclidean data, IEEE Signal Processing Magazine, vol.34, issue.4, pp.18-42, 2017. ,

Invariant scattering convolution networks, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), vol.35, pp.1872-1886, 2013. ,

, Learning stable group invariant representations with convolutional networks, 2013.

Convex optimization: Algorithms and complexity. Foundations and Trends R in Machine Learning, vol.8, pp.3-4, 2015. ,

Generalization bounds of stochastic gradient descent for wide and deep neural networks, Advances in Neural Information Processing Systems (NeurIPS), 2019. ,

Optimal rates for the regularized least-squares algorithm, Foundations of Computational Mathematics, vol.7, issue.3, pp.331-368, 2007. ,

Opportunities and obstacles for deep learning in biology and medicine, Journal of The Royal Society Interface, vol.15, issue.141, 2018. ,

On the global convergence of gradient descent for overparameterized models using optimal transport, Advances in Neural Information Processing Systems (NeurIPS), 2018. ,

URL : https://hal.archives-ouvertes.fr/hal-01798792

On lazy training in differentiable programming, Advances in Neural Information Processing Systems (NeurIPS), 2019. ,

URL : https://hal.archives-ouvertes.fr/hal-01945578

Kernel methods for deep learning, Advances in Neural Information Processing Systems (NIPS), 2009. ,

Parseval networks: Improving robustness to adversarial examples, International Conference on Machine Learning (ICML), 2017. ,

An Analysis of Single-Layer Networks in Unsupervised Feature Learning, Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2011. ,

Certified adversarial robustness via randomized smoothing, Proceedings of the International Conference on Machine Learning (ICML), 2019. ,

Group equivariant convolutional networks, International Conference on Machine Learning (ICML), 2016. ,

Spherical CNNs, Proceedings of the International Conference on Learning Representations (ICLR), 2018. ,

On the mathematical foundations of learning, Bulletin of the American mathematical society, vol.39, issue.1, pp.1-49, 2002. ,

Sgd learns the conjugate kernel class of the network, Advances in Neural Information Processing Systems (NIPS), 2017. ,

Toward deeper understanding of neural networks: The power of initialization and a dual view on expressivity, Advances in Neural Information Processing Systems (NIPS), 2016. ,

Random features for compositional kernels, 2017. ,

Saga: A fast incremental gradient method with support for non-strongly convex composite objectives, Advances in Neural Information Processing Systems (NIPS), 2014. ,

URL : https://hal.archives-ouvertes.fr/hal-01016843

Finito: A faster, permutable incremental gradient method for big data problems, Proceedings of the International Conference on Machine Learning (ICML), 2014. ,

A probabilistic theory of pattern recognition, 1996. ,

Vector Measures, 1977. ,

Nonparametric stochastic approximation with large stepsizes, The Annals of Statistics, vol.44, issue.4, pp.1363-1399, 2016. ,

Harder, better, faster, stronger convergence rates for least-squares regression, Journal of Machine Learning Research (JMLR), vol.18, issue.1, pp.3520-3570, 2017. ,

URL : https://hal.archives-ouvertes.fr/hal-01275431

Double backpropagation increasing generalization performance, International Joint Conference on Neural Networks (IJCNN), 1991. ,

Gradient descent finds global minima of deep neural networks, Proceedings of the International Conference on Machine Learning (ICML), 2019. ,

Gradient descent provably optimizes overparameterized neural networks, Proceedings of the International Conference on Learning Representations (ICLR), 2019. ,

Efficient online and batch learning using forward backward splitting, Journal of Machine Learning Research (JMLR), vol.10, pp.2899-2934, 2009. ,

Privacy aware learning, Advances in Neural Information Processing Systems (NIPS), 2012. ,

Training generative neural networks via maximum mean discrepancy optimization, Conference on Uncertainty in Artificial Intelligence (UAI), 2015. ,

Spherical harmonics in p dimensions, 2014. ,

Fast randomized kernel ridge regression with statistical guarantees, Advances in Neural Information Processing Systems (NIPS), 2015. ,

Exploring the landscape of spatial robustness, Proceedings of the International Conference on Machine Learning (ICML), 2019. ,

Efficient SVM training using low-rank kernel representations, Journal of Machine Learning Research, vol.2, pp.243-264, 2001. ,

Sobolev norm learning rates for regularized least-squares algorithm, 2017. ,

A course in abstract harmonic analysis, 2016. ,

Deep convolutional networks as shallow gaussian processes, Proceedings of the International Conference on Learning Representations (ICLR), 2019. ,

Linearized two-layers neural networks in high dimension, 2019. ,

Size-independent sample complexity of neural networks, Conference on Learning Theory (COLT), 2018. ,

A kernel two-sample test, Journal of Machine Learning Research, vol.13, pp.723-773, 2012. ,

Improved training of Wasserstein GANs, Advances in Neural Information Processing Systems (NIPS), 2017. ,

Implicit regularization in matrix factorization, Advances in Neural Information Processing Systems (NIPS), 2017. ,

Implicit bias of gradient descent on linear convolutional networks, Advances in Neural Information Processing Systems (NeurIPS), 2018. ,

A distribution-free theory of nonparametric regression, 2006. ,

Invariant kernel functions for pattern analysis and machine learning, Machine learning, vol.68, issue.1, pp.35-61, 2007. ,

Motif kernel generated by genetic programming improves remote homology and fold detection, BMC bioinformatics, vol.8, issue.1, p.23, 2007. ,

The elements of statistical learning, 2009. ,

Statistical learning with sparsity: the lasso and generalizations, 2015. ,

Deep residual learning for image recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. ,

Convex analysis and minimization algorithms I: Fundamentals. Springer science & business media, 1993. ,

Variance Reduced Stochastic Gradient Descent with Neighbors, Advances in Neural Information Processing Systems (NIPS), 2015. ,

URL : https://hal.archives-ouvertes.fr/hal-01248672

Multilayer feedforward networks are universal approximators, Neural networks, vol.2, issue.5, pp.359-366, 1989. ,

Random design analysis of ridge regression, Foundations of Computational Mathematics, vol.14, issue.3, 2014. ,

Neural tangent kernel: Convergence and generalization in neural networks, Advances in Neural Information Processing Systems (NeurIPS), 2018. ,

URL : https://hal.archives-ouvertes.fr/hal-01824549

Accelerating stochastic gradient descent using predictive variance reduction, Advances in Neural Information Processing Systems (NIPS), 2013. ,

On the complexity of linear prediction: Risk bounds, margin bounds, and regularization, Advances in Neural Information Processing Systems (NIPS), 2009. ,

, Adversarial risk bounds via function transformation, 2018.

Some results on tchebycheffian spline functions, Journal of mathematical analysis and applications, vol.33, issue.1, pp.82-95, 1971. ,

Local rademacher complexities and oracle inequalities in risk minimization, The Annals of Statistics, vol.34, issue.6, pp.2593-2656, 2006. ,

Empirical margin distributions and bounding the generalization error of combined classifiers. The Annals of Statistics, vol.30, pp.1-50, 2002. ,

On the generalization of equivariance and convolution in neural networks to the action of compact groups, Proceedings of the International Conference on Machine Learning (ICML), 2018. ,

Imagenet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems (NIPS), 2012. ,

Estimate sequences for stochastic composite optimization: Variance reduction, acceleration, and robustness to noise, 2019. ,

URL : https://hal.archives-ouvertes.fr/hal-01993531

A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method, 2012. ,

URL : https://hal.archives-ouvertes.fr/hal-00768187

An optimal randomized incremental gradient method, 2017. ,

An iteration formula for fredholm integral equations of the first kind, American journal of mathematics, vol.73, issue.3, pp.615-624, 1951. ,

Backpropagation applied to handwritten zip code recognition, Neural computation, vol.1, issue.4, pp.541-551, 1989. ,

Deep learning, Nature, vol.521, issue.7553, pp.436-444, 2015. ,

Certified robustness to adversarial examples with differential privacy, IEEE Symposium on Security and Privacy (SP), 2019. ,

Deep neural networks as gaussian processes, Proceedings of the International Conference on Learning Representations (ICLR), 2018. ,

Wide neural networks of any depth evolve as linear models under gradient descent, Advances in Neural Information Processing Systems (NeurIPS), 2019. ,

Mmd gan: Towards deeper understanding of moment matching network, Advances in Neural Information Processing Systems (NIPS), 2017. ,

Learning overparameterized neural networks via stochastic gradient descent on structured data, Advances in Neural Information Processing Systems (NeurIPS), 2018. ,

Algorithmic regularization in over-parameterized matrix sensing and neural networks with quadratic activations, Conference on Learning Theory (COLT), 2018. ,

Just interpolate: Kernel "ridgeless" regression can generalize, Annals of Statistics, 2019. ,

Fisher-Rao metric, geometry, and complexity of neural networks, Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2018. ,

A Universal Catalyst for First-Order Optimization, Advances in Neural Information Processing Systems (NIPS), 2015. ,

URL : https://hal.archives-ouvertes.fr/hal-01160728

Optimal rates for spectral algorithms with least-squares regression over hilbert spaces, Applied and Computational Harmonic Analysis, 2018. ,

URL : https://hal.archives-ouvertes.fr/hal-01958890

Training invariant support vector machines using selective sampling, Large Scale Kernel Machines, pp.301-320, 2007. ,

A unified gradient regularization family for adversarial examples, IEEE International Conference on Data Mining (ICDM), 2015. ,

Learning word vectors for sentiment analysis, The 49th Annual Meeting of the Association for Computational Linguistics (ACL), pp.142-150, 2011. ,

Towards deep learning models resistant to adversarial attacks, Proceedings of the International Conference on Learning Representations (ICLR), 2018. ,

Incremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning, SIAM Journal on Optimization, vol.25, issue.2, pp.829-855, 2015. ,

End-to-End Kernel Learning with Supervised Convolutional Kernel Networks, Advances in Neural Information Processing Systems (NIPS), 2016. ,

Convolutional kernel networks, Advances in Neural Information Processing Systems (NIPS), 2014. ,

URL : https://hal.archives-ouvertes.fr/hal-01005489

Group invariant scattering, Communications on Pure and Applied Mathematics, vol.65, issue.10, pp.1331-1398, 2012. ,

Smooth discrimination analysis. The Annals of Statistics, vol.27, pp.1808-1829, 1999. ,

Risk bounds for statistical learning, The Annals of Statistics, vol.34, issue.5, pp.2326-2366, 2006. ,

, Gaussian process behaviour in wide deep neural networks, 2018.

A mean field view of the landscape of twolayer neural networks, Proceedings of the National Academy of Sciences, vol.115, issue.33, pp.7665-7671, 2018. ,

Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit, Conference on Learning Theory (COLT), 2019. ,

Stability selection, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.72, issue.4, pp.417-473, 2010. ,

Spectral normalization for generative adversarial networks, Proceedings of the International Conference on Learning Representations (ICLR), 2018. ,

Virtual adversarial training: a regularization method for supervised and semi-supervised learning, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2018. ,

Kernel analysis of deep networks, Journal of Machine Learning Research (JMLR), vol.12, pp.2563-2581, 2011. ,

Learning with group invariant features: A kernel perspective, Advances in Neural Information Processing Systems (NIPS), 2015. ,

Kernel mean embedding of distributions: A review and beyond. Foundations and Trends in Machine Learning, vol.10, pp.1-141, 2017. ,

Scop: a structural classification of proteins database for the investigation of sequences and structures, Journal of molecular biology, vol.247, issue.4, pp.536-540, 1995. ,

Bayesian learning for neural networks, 1996. ,

Robust Stochastic Approximation Approach to Stochastic Programming, SIAM Journal on Optimization, vol.19, issue.4, pp.1574-1609, 2009. ,

URL : https://hal.archives-ouvertes.fr/hal-00976649

Introductory Lectures on Convex Optimization, 2004. ,

Iterate averaging as regularization for stochastic gradient descent, Conference on Learning Theory (COLT), 2018. ,

Norm-based capacity control in neural networks, Conference on Learning Theory (COLT), 2015. ,

In search of the real inductive bias: On the role of implicit regularization in deep learning, Proceedings of the International Conference on Learning Representations (ICLR), 2015. ,

Exploring generalization in deep learning, Advances in Neural Information Processing Systems (NIPS), 2017. ,

A PAC-Bayesian approach to spectrally-normalized margin bounds for neural networks, Proceedings of the International Conference on Learning Representations (ICLR), 2018. ,

The role of overparametrization in generalization of neural networks, Proceedings of the International Conference on Learning Representations (ICLR), 2019. ,

Bayesian deep convolutional networks with many channels are gaussian processes, Proceedings of the International Conference on Learning Representations (ICLR), 2019. ,

Deep roto-translation scattering for object classification, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015. ,

Scaling the scattering transform: Deep hybrid networks, International Conference on Computer Vision (ICCV, 2017. ,

URL : https://hal.archives-ouvertes.fr/hal-01495734

Transformation pursuit for image classification, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014. ,

URL : https://hal.archives-ouvertes.fr/hal-00979464

Approximation theory of the mlp model in neural networks, Acta numerica, vol.8, pp.143-195, 1999. ,

Certified defenses against adversarial examples, Proceedings of the International Conference on Learning Representations (ICLR), 2018. ,

Random features for large-scale kernel machines, Advances in Neural Information Processing Systems (NIPS), 2007. ,

Local group invariant representations via orbit embeddings, International Conference on Artificial Intelligence and Statistics, 2017. ,

Early stopping and non-parametric regression: an optimal data-dependent stopping rule, Journal of Machine Learning Research, vol.15, issue.1, pp.335-366, 2014. ,

A stochastic approximation method. The annals of mathematical statistics, pp.400-407, 1951. ,

Decoupling direction and norm for efficient gradient-based l2 adversarial attacks and defenses, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. ,

Are loss functions all the same?, Neural Computation, vol.16, issue.5, pp.1063-1076, 2004. ,

The perceptron: a probabilistic model for information storage and organization in the brain, Psychological review, vol.65, issue.6, p.386, 1958. ,

1 regularization in infinite dimensional feature spaces, Conference on Learning Theory (COLT), 2007. ,

Stabilizing training of generative adversarial networks through regularization, Advances in Neural Information Processing Systems (NIPS), 2017. ,

, Adversarially robust training through structured gradient regularization, 2018.

Generalization properties of learning with random features, Advances in Neural Information Processing Systems, pp.3215-3225, 2017. ,

Less is more: Nyström computational regularization, Advances in Neural Information Processing Systems (NIPS), 2015. ,

Integral transforms, reproducing kernels and their applications, vol.369, 1997. ,

Provably robust deep learning via adversarially trained smoothed classifiers, Advances in Neural Information Processing Systems (NeurIPS), 2019. ,

How do infinite width bounded norm networks look in function space?, Conference on Learning Theory (COLT), 2019. ,

Boosting: Foundations and algorithms, 2012. ,

Boosting the margin: A new explanation for the effectiveness of voting methods. The annals of statistics, vol.26, pp.1651-1686, 1998. ,

Adversarially robust generalization requires more data, Advances in Neural Information Processing Systems (NeurIPS), 2018. ,

Minimizing finite sums with the stochastic average gradient, Mathematical Programming, vol.162, issue.1, pp.83-112, 2017. ,

URL : https://hal.archives-ouvertes.fr/hal-00860051

Positive definite functions on spheres, Duke Mathematical Journal, vol.9, issue.1, pp.96-108, 1942. ,

Support Vector Learning, 1997. ,

Learning with kernels: support vector machines, regularization, optimization, and beyond, 2001. ,

Nonlinear component analysis as a kernel eigenvalue problem, Neural Computation, vol.10, issue.5, pp.1299-1319, 1998. ,

The singular values of convolutional layers, Proceedings of the International Conference on Learning Representations (ICLR), 2019. ,

SDCA without Duality, Regularization, and Individual Convexity, International Conference on Machine Learning (ICML), 2016. ,

Understanding machine learning: From theory to algorithms, 2014. ,

Stochastic dual coordinate ascent methods for regularized loss minimization, Journal of Machine Learning Research (JMLR), vol.14, pp.567-599, 2013. ,

Learning kernel-based halfspaces with the 0-1 loss, SIAM Journal on Computing, vol.40, issue.6, pp.1623-1646, 2011. ,

Kernel methods for pattern analysis, 2004. ,

Rotation, scaling and deformation invariant scattering for texture discrimination, Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2013. ,

Transformation invariance in pattern recognition-tangent distance and tangent propagation, Neural networks: tricks of the trade, pp.239-274, 1998. ,

URL : https://hal.archives-ouvertes.fr/halshs-00009505

First-order adversarial vulnerability of neural networks and input dimension, Proceedings of the International Conference on Machine Learning (ICML), 2019. ,

Very deep convolutional networks for large-scale image recognition, Proceedings of the International Conference on Learning Representations (ICLR), 2014. ,

Certifying some distributional robustness with principled adversarial training, Proceedings of the International Conference on Learning Representations (ICLR), 2018. ,

Estimating the approximation error in learning theory, Analysis and Applications, vol.1, issue.01, pp.17-41, 2003. ,

Mathematics of the neural response, Foundations of Computational Mathematics, vol.10, issue.1, pp.67-91, 2010. ,

Sparse greedy matrix approximation for machine learning, Proceedings of the International Conference on Machine Learning (ICML), 2000. ,

Regularization with dot-product kernels, Advances in Neural Information Processing Systems (NIPS), 2001. ,

Theoretical insights into the optimization landscape of over-parameterized shallow neural networks, IEEE Transactions on Information Theory, vol.65, issue.2, pp.742-769, 2018. ,

The implicit bias of gradient descent on separable data, Journal of Machine Learning Research (JMLR), vol.19, issue.1, pp.2822-2878, 2018. ,

On the empirical estimation of integral probability metrics, Electronic Journal of Statistics, vol.6, pp.1550-1599, 2012. ,

, Harmonic Analysis: Real-variable Methods, Orthogonality, and Oscillatory Integrals, 1993.

Support vector machines, 2008. ,

Learning with hierarchical gaussian kernels, 2016. ,

Intriguing properties of neural networks, International Conference on Learning Representations (ICLR), 2014. ,

Rethinking the inception architecture for computer vision, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. ,

Margins, shrinkage and boosting, Proceedings of the International Conference on Machine Learning (ICML), 2013. ,

Statistics of natural image categories. Network: computation in neural systems, vol.14, pp.391-412, 2003. ,

Local geometry of deformable templates, SIAM journal on mathematical analysis, vol.37, issue.1, pp.17-59, 2005. ,

Robustness may be at odds with accuracy, Proceedings of the International Conference on Learning Representations (ICLR), 2019. ,

Introduction to Nonparametric Estimation, 2008. ,

A theory of the learnable, Proceedings of the sixteenth annual ACM symposium on Theory of computing, pp.436-445, 1984. ,

A Gene-Expression Signature as a Predictor of Survival in Breast Cancer, New England Journal of Medicine, vol.347, issue.25, 1999. ,

Learning with marginalized corrupted features, International Conference on Machine Learning (ICML), 2013. ,

The nature of statistical learning theory, 2000. ,

On the uniform convergence of relative frequencies of events to their probabilities, Theory of Probability and its Applications, vol.16, p.264, 1971. ,

Machine learning with kernel methods, Course in the "Mathématiques, Vision, Apprentissage" Master. ENS Cachan, 2017. ,

Distance-based classification with lipschitz functions, Journal of Machine Learning Research (JMLR), vol.5, pp.669-695, 2004. ,

Altitude Training: Strong Bounds for Single-layer Dropout, Advances in Neural Information Processing Systems (NIPS), 2014. ,

Spline models for observational data, vol.59, 1990. ,

High-dimensional statistics: A non-asymptotic viewpoint, vol.48, 2019. ,

Regularization matters: Generalization and optimization of neural nets v.s. their induced kernel, Advances in Neural Information Processing Systems (NeurIPS), 2019. ,

A mathematical theory of deep convolutional neural networks for feature extraction, IEEE Transactions on Information Theory, vol.64, issue.3, pp.1845-1866, 2018. ,

Computing with infinite networks, Advances in Neural Information Processing Systems (NIPS), 1997. ,

Using the Nyström method to speed up kernel machines, Advances in Neural Information Processing Systems (NIPS), 2001. ,

Gradient dynamics of shallow low-dimensional relu networks, Advances in Neural Information Processing Systems (NeurIPS), 2019. ,

The marginal value of adaptive gradient methods in machine learning, Advances in Neural Information Processing Systems (NIPS), 2017. ,

Provable defenses against adversarial examples via the convex outer adversarial polytope, Proceedings of the International Conference on Machine Learning (ICML), 2018. ,

Dual averaging methods for regularized stochastic learning and online optimization, Journal of Machine Learning Research (JMLR), vol.11, pp.2543-2596, 2010. ,

A proximal stochastic gradient method with progressive variance reduction, SIAM Journal on Optimization, vol.24, issue.4, pp.2057-2075, 2014. ,

Diverse neural network learns true target functions, Proceedings of the International Conference on Artificial Intelligence and Statistics, 2017. ,

Robust regression and lasso, Advances in Neural Information Processing Systems (NIPS), 2009. ,

Robustness and regularization of support vector machines, Journal of Machine Learning Research (JMLR), vol.10, pp.1485-1510, 2009. ,

Scaling limits of wide neural networks with weight sharing: Gaussian process behavior, gradient independence, and neural tangent kernel derivation, 2019. ,

A fine-grained spectral perspective on neural networks, 2019. ,

On early stopping in gradient descent learning, Constructive Approximation, vol.26, issue.2, pp.289-315, 2007. ,

Rademacher complexity for adversarially robust generalization, Proceedings of the International Conference on Machine Learning (ICML), 2019. ,

, Spectral norm regularization for improving the generalizability of deep learning, 2017.

Wide residual networks, 2016. ,

URL : https://hal.archives-ouvertes.fr/hal-01832503

Understanding deep learning requires rethinking generalization, Proceedings of the International Conference on Learning Representations (ICLR), 2017. ,

Are all layers created equal?, 2019. ,

Improved nyström low-rank approximation and error analysis, Proceedings of the International Conference on Machine Learning (ICML), 2008. ,

Boosting with early stopping: Convergence and consistency, The Annals of Statistics, vol.33, issue.4, pp.1538-1579, 2005. ,

1 -regularized neural networks are improperly learnable in polynomial time, International Conference on Machine Learning (ICML), 2016. ,

Convexified convolutional neural networks, International Conference on Machine Learning (ICML), 2017. ,

Lightweight stochastic optimization for minimizing finite sums with infinite data, Proceedings of the International Conference on Machine Learning (ICML), 2018. ,

Improving the robustness of deep neural networks via stability training, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. ,

Stochastic gradient descent optimizes overparameterized deep relu networks, Machine Learning, 2019. ,