H. Larochelle, D. Erhan, A. Courville, J. Bergstra, and Y. Bengio, An empirical evaluation of deep architectures on problems with many factors of variation, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.473-480, 2007.
DOI : 10.1145/1273496.1273556

G. E. Hinton, S. Osindero, and Y. Teh, A Fast Learning Algorithm for Deep Belief Nets, Neural Computation, vol.18, issue.7, pp.1527-1554, 2006.
DOI : 10.1162/jmlr.2003.4.7-8.1235

P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P. A. , Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion, Machine Learning Research, vol.11, pp.3371-3408, 2010.

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, pp.2278-2324, 1998.
DOI : 10.1109/5.726791

N. Pinto, D. Doukhan, J. J. Dicarlo, and D. D. Cox, A High-Throughput Screening Approach to Discovering Good Forms of Biologically Inspired Visual Representation, PLoS Computational Biology, vol.5, issue.11, p.1000579, 2009.
DOI : 10.1371/journal.pcbi.1000579.s013

A. Coates, H. Lee, and A. Ng, An analysis of single-layer networks in unsupervised feature learning, NIPS Deep Learning and Unsupervised Feature Learning Workshop, 2010.

A. Coates and A. Y. Ng, The importance of encoding versus training with sparse coding and vector quantization, Proceedings of the Twenty-eighth International Conference on Machine Learning (ICML- 11), 2010.

F. Hutter, Automated Configuration of Algorithms for Solving Hard Computational Problems, 2009.

F. Hutter, H. Hoos, and K. Leyton-brown, Sequential Model-Based Optimization for General Algorithm Configuration, 2011.
DOI : 10.1007/978-0-387-84858-7

D. R. Jones, A taxonomy of global optimization methods based on response surfaces, Journal of Global Optimization, vol.21, issue.4, pp.345-383, 2001.
DOI : 10.1023/A:1012771025575

J. Villemonteix, E. Vazquez, and E. Walter, An informational approach to the global optimization of expensive-to-evaluate functions, Journal of Global Optimization, vol.10, issue.5, 2006.
DOI : 10.1007/s10898-008-9354-2
URL : https://hal.archives-ouvertes.fr/hal-00354262

N. Srinivas, A. Krause, S. Kakade, and M. Seeger, Gaussian process optimization in the bandit setting: No regret and experimental design, ICML, 2010.

J. Mockus, V. Tiesis, and A. Zilinskas, The application of Bayesian methods for seeking the extremum, Towards Global Optimization, pp.117-129, 1978.

D. Ginsbourger, D. Dupuy, A. Badea, L. Carraro, and O. Roustant, A note on the choice and the estimation of Kriging models for the analysis of deterministic computer experiments, Applied Stochastic Models in Business and Industry, vol.21, issue.7, pp.115-131, 2009.
DOI : 10.1002/asmb.741
URL : https://hal.archives-ouvertes.fr/hal-00270173

R. Bardenet and B. Kégl, Surrogating the surrogate: accelerating Gaussian Process optimization with mixtures, ICML, 2010.

P. Larrañaga and J. Lozano, Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation, 2001.
DOI : 10.1007/978-1-4615-1539-5

N. Hansen, The CMA evolution strategy: a comparing review Advances on estimation of distribution algorithms, pp.75-102, 2006.

J. Bergstra and Y. Bengio, Random search for hyper-parameter optimization. The Learning Workshop (Snowbird), 2011.

A. Hyvärinen and E. Oja, Independent component analysis: algorithms and applications, Neural Networks, vol.13, issue.4-5, pp.411-430, 2000.
DOI : 10.1016/S0893-6080(00)00026-5

J. Bergstra and Y. Bengio, Random search for hyper-parameter optimization, JMLR, 2012.

C. Bishop, Neural networks for pattern recognition, 1995.

J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu et al., Theano: a CPU and GPU math expression compiler, Proceedings of the Python for Scientific Computing Conference (SciPy), 2010.