Y. Bengio, Learning deep architectures for AI. Found, Trends Mach. Learn, 2009.
DOI : 10.1561/2200000006

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

L. Bo, K. Lai, X. Ren, and D. Fox, Object recognition with hierarchical kernel descriptors, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.5995719

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

L. Bo, X. Ren, and D. Fox, Kernel descriptors for visual recognition, Adv. NIPS, 2010.

L. Bo, X. Ren, and D. Fox, Unsupervised Feature Learning for RGB-D Based Object Recognition, Experimental Robotics, 2013.
DOI : 10.1007/978-3-319-00065-7_27

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

L. Bo and C. Sminchisescu, Efficient match kernel between sets of features for visual recognition, Adv. NIPS, 2009.

J. V. Bouvrie, L. Rosasco, and T. Poggio, On invariance in hierarchical models, Adv. NIPS, 2009.

J. Bruna and S. Mallat, Invariant Scattering Convolution Networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.35, issue.8, pp.1872-1886, 2013.
DOI : 10.1109/TPAMI.2012.230

R. H. Byrd, P. Lu, J. Nocedal, and C. Zhu, A Limited Memory Algorithm for Bound Constrained Optimization, SIAM Journal on Scientific Computing, vol.16, issue.5, pp.1190-1208, 1995.
DOI : 10.1137/0916069

Y. Cho and L. K. Saul, Large-Margin Classification in Infinite Neural Networks, Neural Computation, vol.10, issue.10, 2010.
DOI : 10.1109/TIT.2002.808136

D. Ciresan, U. Meier, and J. Schmidhuber, Multi-column deep neural networks for image classification, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012.
DOI : 10.1109/CVPR.2012.6248110

A. Coates and A. Y. Ng, Selecting receptive fields in deep networks, Adv. NIPS, 2011.

A. Coates, A. Y. Ng, and H. Lee, An analysis of single-layer networks in unsupervised feature learning, Proc. AISTATS, 2011.

D. Decoste and B. Schölkopf, Training invariant support vector machines, Machine Learning, vol.46, issue.1/3, pp.161-190, 2002.
DOI : 10.1023/A:1012454411458

J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang et al., DeCAF: A deep convolutional activation feature for generic visual recognition, 2013.

R. Fan, K. Chang, C. Hsieh, X. Wang, and C. Lin, LIBLINEAR: A library for large linear classification, J. Mach. Learn. Res, vol.9, pp.1871-1874, 2008.

R. Gens and P. Domingos, Discriminative learning of sum-product networks, Adv. NIPS, 2012.

K. Jarrett, K. Kavukcuoglu, M. Ranzato, and Y. Lecun, What is the best multi-stage architecture for object recognition?, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459469

A. Krizhevsky and G. Hinton, Learning multiple layers of features from tiny images, Tech. Rep, 2009.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, Adv. NIPS, 2012.

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, vol.86, issue.11, pp.2278-2324, 1998.
DOI : 10.1109/5.726791

B. A. Olshausen and D. J. Field, Emergence of simple-cell receptive field properties by learning a sparse code for natural images, Nature, vol.381, issue.6583, pp.381607-609, 1996.
DOI : 10.1038/381607a0

A. Rahimi and B. Recht, Random features for large-scale kernel machines, Adv. NIPS, 2007.

M. Ranzato, F. Huang, Y. Boureau, and Y. Lecun, Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition, 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007.
DOI : 10.1109/CVPR.2007.383157

J. Shawe-taylor and N. Cristianini, Kernel methods for pattern analysis, 2004.
DOI : 10.1017/CBO9780511809682

K. Sohn and H. Lee, Learning invariant representations with local transformations, Proc. ICML, 2012.

K. Swersky, J. Snoek, and R. P. Adams, Multi-task Bayesian optimization, Adv. NIPS, 2013.

G. Wahba, Spline models for observational data, SIAM, 1990.
DOI : 10.1137/1.9781611970128

L. Wan, M. D. Zeiler, S. Zhang, Y. Lecun, and R. Fergus, Regularization of neural networks using dropconnect, Proc. ICML, 2013.

C. Williams and M. Seeger, Using the Nyström method to speed up kernel machines, Adv. NIPS, 2001.

M. D. Zeiler and R. Fergus, Stochastic pooling for regularization of deep convolutional neural networks, Proc. ICLR, 2013.

M. D. Zeiler and R. Fergus, Visualizing and Understanding Convolutional Networks, Proc. ECCV, 2014.
DOI : 10.1007/978-3-319-10590-1_53

URL : http://arxiv.org/abs/1311.2901