F. Anselmi, L. Rosasco, and T. Poggio, On invariance and selectivity in representation learning, Information and Inference, vol.5, issue.2, pp.134-158, 2016.
DOI : 10.1093/imaiai/iaw009
URL : https://academic.oup.com/imaiai/article-pdf/5/2/134/6990886/iaw009.pdf

F. Anselmi, L. Rosasco, C. Tan, and T. Poggio, Deep convolutional networks are hierarchical kernel machines, 2015.

A. Bietti and J. Mairal, Group invariance and stability to deformations of deep convolutional representations, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01536004

L. Bo, K. Lai, X. Ren, and D. Fox, Object recognition with hierarchical kernel descriptors, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.5995719
URL : http://www.cs.washington.edu/homes/lfb/paper/cvpr11.pdf

J. Bruna and S. Mallat, Invariant Scattering Convolution Networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.35, issue.8, pp.1872-1886, 2013.
DOI : 10.1109/TPAMI.2012.230
URL : http://arxiv.org/pdf/1203.1513

J. Bruna, A. Szlam, and Y. Lecun, Learning stable group invariant representations with convolutional networks, 2013.

Y. Cho and L. K. Saul, Kernel methods for deep learning, Advances in Neural Information Processing Systems (NIPS), 2009.

M. Cisse, P. Bojanowski, E. Grave, Y. Dauphin, and N. Usunier, Parseval networks: Improving robustness to adversarial examples, International Conference on Machine Learning (ICML), 2017.

T. Cohen and M. Welling, Group equivariant convolutional networks, International Conference on Machine Learning (ICML), 2016.

A. Daniely, R. Frostig, and Y. Singer, Toward deeper understanding of neural networks: The power of initialization and a dual view on expressivity, Advances in Neural Information Processing Systems (NIPS), 2016.

J. Diestel and J. J. Uhl, Vector Measures, 1977.

S. Fine and K. Scheinberg, Efficient SVM training using low-rank kernel representations, Journal of Machine Learning Research (JMLR), vol.2, pp.243-264, 2001.

B. Haasdonk and H. Burkhardt, Invariant kernel functions for pattern analysis and??machine learning, Machine Learning, vol.29, issue.1, pp.35-61, 2007.
DOI : 10.1017/CBO9780511809682
URL : https://link.springer.com/content/pdf/10.1007%2Fs10994-007-5009-7.pdf

Y. Lecun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard et al., Backpropagation Applied to Handwritten Zip Code Recognition, Neural Computation, vol.1, issue.4, pp.541-551, 1989.
DOI : 10.1007/BF00133697

S. Mallat, Group Invariant Scattering, Communications on Pure and Applied Mathematics, vol.37, issue.10, pp.1331-1398, 2012.
DOI : 10.1137/S0036141002404838
URL : http://arxiv.org/pdf/1101.2286

G. Montavon, M. L. Braun, and K. Müller, Kernel analysis of deep networks, Journal of Machine Learning Research (JMLR), vol.12, pp.2563-2581, 2011.

Y. Mroueh, S. Voinea, and T. A. Poggio, Learning with group invariant features: A kernel perspective, Advances in Neural Information Processing Systems (NIPS), 2015.

K. Muandet, K. Fukumizu, B. Sriperumbudur, and B. Schölkopf, Kernel mean embedding of distributions: A review and beyond. Foundations and Trends, Machine Learning, pp.1-141, 2017.
DOI : 10.1561/2200000060

E. Oyallon and S. Mallat, Deep roto-translation scattering for object classification, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
DOI : 10.1109/CVPR.2015.7298904
URL : http://arxiv.org/pdf/1412.8659

A. Raj, A. Kumar, Y. Mroueh, T. Fletcher, and B. Schoelkopf, Local group invariant representations via orbit embeddings, International Conference on Artificial Intelligence and Statistics (AISTATS), 2017.

S. Saitoh, Integral transforms, reproducing kernels and their applications, 1997.

I. J. Schoenberg, Positive definite functions on spheres, Duke Mathematical Journal, vol.9, issue.1, pp.96-108, 1942.
DOI : 10.1215/S0012-7094-42-00908-6

B. Schölkopf, Support Vector Learning, 1997.

B. Schölkopf, A. Smola, and K. Müller, Nonlinear Component Analysis as a Kernel Eigenvalue Problem, Neural Computation, vol.20, issue.5, pp.1299-1319, 1998.
DOI : 10.1007/BF02281970

B. Schölkopf and A. J. Smola, Learning with kernels: support vector machines, regularization, optimization, and beyond, 2001.

S. Shalev-shwartz and S. Ben-david, Understanding machine learning: From theory to algorithms, 2014.
DOI : 10.1017/CBO9781107298019

L. Sifre and S. Mallat, Rotation, Scaling and Deformation Invariant Scattering for Texture Discrimination, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013.
DOI : 10.1109/CVPR.2013.163
URL : http://www.cmapx.polytechnique.fr/~sifre/research/cvpr_13_sifre_mallat_final.pdf

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, International Conference on Learning Representations (ICLR), 2014.

A. J. Smola and B. Schölkopf, Sparse greedy matrix approximation for machine learning, Proceedings of the International Conference on Machine Learning (ICML), 2000.

E. M. Stein, Harmonic Analysis: Real-variable Methods, Orthogonality, and Oscillatory Integrals, 1993.

A. Torralba and A. Oliva, Statistics of natural image categories. Network: computation in neural systems, pp.391-412, 2003.
DOI : 10.1088/0954-898x_14_3_302

C. Williams and M. Seeger, Using the Nyström method to speed up kernel machines, Advances in Neural Information Processing Systems (NIPS), 2001.

Y. Zhang, J. D. Lee, and M. I. Jordan, 1 -regularized neural networks are improperly learnable in polynomial time, International Conference on Machine Learning (ICML), 2016.