F. Anselmi, L. Rosasco, and T. Poggio, On invariance and selectivity in representation learning, Information and Inference, vol.5, issue.2, pp.134-158, 2016.
DOI : 10.1093/imaiai/iaw009
URL : http://arxiv.org/abs/1503.05938

F. Anselmi, L. Rosasco, C. Tan, and T. Poggio, Deep convolutional networks are hierarchical kernel machines, 2015.

F. Bach, On the equivalence between kernel quadrature rules and random feature expansions, Journal of Machine Learning Research (JMLR), vol.18, pp.1-38, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01118276

L. Bo, K. Lai, X. Ren, and D. Fox, Object recognition with hierarchical kernel descriptors, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.5995719
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.225.931

J. Bruna and S. Mallat, Invariant Scattering Convolution Networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.35, issue.8, pp.1872-1886, 2013.
DOI : 10.1109/TPAMI.2012.230
URL : http://arxiv.org/abs/1203.1513

J. Bruna, A. Szlam, and Y. Lecun, Learning stable group invariant representations with convolutional networks, 2013.

Y. Cho and L. K. Saul, Kernel methods for deep learning, Advances in Neural Information Processing Systems (NIPS), 2009.

M. Cisse, P. Bojanowski, E. Grave, Y. Dauphin, and N. Usunier, Parseval networks: Improving robustness to adversarial examples, International Conference on Machine Learning (ICML), 2017.

T. Cohen and M. Welling, Group equivariant convolutional networks, International Conference on Machine Learning (ICML), 2016.

A. Daniely, R. Frostig, V. Gupta, and Y. Singer, Random features for compositional kernels, 2017.

A. Daniely, R. Frostig, and Y. Singer, Toward deeper understanding of neural networks: The power of initialization and a dual view on expressivity, Advances in Neural Information Processing Systems (NIPS), 2016.

J. Diestel and J. J. Uhl, Vector Measures, 1977.

S. Fine and K. Scheinberg, Efficient SVM training using low-rank kernel representations, Journal of Machine Learning Research (JMLR), vol.2, pp.243-264, 2001.

Q. Le, T. Sarlós, and A. Smola, Fastfood?approximating kernel expansions in loglinear time, Proceedings of the International Conference on Machine Learning (ICML), 2013.

Y. Lecun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard et al., Backpropagation Applied to Handwritten Zip Code Recognition, Neural Computation, vol.1, issue.4, pp.541-551, 1989.
DOI : 10.1007/BF00133697

S. Mallat, Group Invariant Scattering, Communications on Pure and Applied Mathematics, vol.37, issue.10, pp.1331-1398, 2012.
DOI : 10.1002/cpa.21413
URL : http://arxiv.org/abs/1101.2286

G. Montavon, M. L. Braun, and K. Müller, Kernel analysis of deep networks, Journal of Machine Learning Research (JMLR), vol.12, pp.2563-2581, 2011.

Y. Mroueh, S. Voinea, and T. A. Poggio, Learning with group invariant features: A kernel perspective, Advances in Neural Information Processing Systems (NIPS), 2015.

K. Muandet, K. Fukumizu, B. Sriperumbudur, and B. Schölkopf, Kernel mean embedding of distributions: A review and beyonds, 2016.

E. Oyallon and S. Mallat, Deep roto-translation scattering for object classification, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
DOI : 10.1109/CVPR.2015.7298904
URL : http://arxiv.org/abs/1412.8659

A. Rahimi and B. Recht, Random features for large-scale kernel machines, Advances in Neural Information Processing Systems (NIPS), 2007.

A. Raj, A. Kumar, Y. Mroueh, P. T. Fletcher, and B. Scholkopf, Local group invariant representations via orbit embeddings, 2016.

S. Saitoh, Integral transforms, reproducing kernels and their applications, 1997.

I. J. Schoenberg, Positive definite functions on spheres, Duke Mathematical Journal, vol.9, issue.1, pp.96-108, 1942.
DOI : 10.1215/S0012-7094-42-00908-6

B. Schölkopf, Support Vector Learning, 1997.

B. Schölkopf, A. Smola, and K. Müller, Nonlinear Component Analysis as a Kernel Eigenvalue Problem, Neural Computation, vol.20, issue.5, pp.1299-1319, 1998.
DOI : 10.1007/BF02281970

B. Schölkopf and A. J. Smola, Learning with kernels: support vector machines, regularization, optimization, and beyond, 2001.

L. Sifre and S. Mallat, Rotation, Scaling and Deformation Invariant Scattering for Texture Discrimination, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013.
DOI : 10.1109/CVPR.2013.163
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.397.6107

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, International Conference on Learning Representations (ICLR), 2014.

A. J. Smola and B. Schölkopf, Sparse greedy matrix approximation for machine learning, Proceedings of the International Conference on Machine Learning (ICML), 2000.

E. M. Stein, Harmonic Analysis: Real-variable Methods, Orthogonality, and Oscillatory Integrals, 1993.

I. Steinwart, P. Thomann, and N. Schmid, Learning with hierarchical gaussian kernels, 2016.

A. Torralba and A. Oliva, Statistics of natural image categories. Network: computation in neural systems, pp.391-412, 2003.

C. Williams and M. Seeger, Using the Nyström method to speed up kernel machines, Advances in Neural Information Processing Systems (NIPS), 2001.

Y. Zhang, J. D. Lee, and M. I. Jordan, 1 -regularized neural networks are improperly learnable in polynomial time, International Conference on Machine Learning (ICML), 2016.