R. Bardenet, M. Brendel, B. Kégl, and M. Sebag, Collaborative hyperparameter tuning, 30th International Conference on Machine Learning, pp.199-207, 2013.
URL : https://hal.archives-ouvertes.fr/in2p3-00907381

Y. Bengio, I. J. Goodfellow, and A. Courville, Deep learning. An MIT Press book in preparation, 2015.
URL : https://hal.archives-ouvertes.fr/hal-00752091

J. Bergstra and Y. Bengio, Random search for hyper-parameter optimization, The Journal of Machine Learning Research, vol.13, issue.1, pp.281-305, 2012.

J. S. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl, Algorithms for hyperparameter optimization, Advances in Neural Information Processing Systems, pp.2546-2554, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00642998

H. Bourlard and Y. Kamp, Auto-association by multilayer perceptrons and singular value decomposition, Biological Cybernetics, vol.13, issue.4-5, pp.291-294, 1988.
DOI : 10.1109/MASSP.1987.1165576

E. Brochu, V. M. Cora, and N. Freitas, A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint, 2010.

K. Eggensperger, M. Feurer, F. Hutter, J. Bergstra, J. Snoek et al., Towards an empirical foundation for assessing bayesian optimization of hyperparameters, NIPS workshop on Bayesian Optimization in Theory and Practice, 2013.

G. E. Hinton, S. Osindero, and Y. Teh, A Fast Learning Algorithm for Deep Belief Nets, Neural Computation, vol.18, issue.7, pp.1527-1554, 2006.
DOI : 10.1162/jmlr.2003.4.7-8.1235
URL : http://www.cs.berkeley.edu/~ywteh/research/ebm/nc2006.pdf

F. Hutter, H. H. Hoos, and K. Leyton-brown, Sequential Model-Based Optimization for General Algorithm Configuration, Learning and Intelligent Optimization, pp.507-523, 2011.
DOI : 10.1007/978-3-642-25566-3_47
URL : http://www.cs.ubc.ca/spider/hutter/papers/10-TR-SMAC.pdf

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long et al., Caffe, Proceedings of the ACM International Conference on Multimedia, MM '14, pp.675-678, 2014.
DOI : 10.1145/2647868.2654889

A. Krizhevsky and G. Hinton, Learning multiple layers of features from tiny images, 2009.

Y. Lecun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard et al., Backpropagation Applied to Handwritten Zip Code Recognition, Neural Computation, vol.1, issue.4, pp.541-551, 1989.
DOI : 10.1007/BF00133697

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, pp.2278-2324, 1998.
DOI : 10.1109/5.726791

R. L. Plackett and J. P. Burman, THE DESIGN OF OPTIMUM MULTIFACTORIAL EXPERIMENTS, Biometrika, vol.33, issue.4, pp.305-325, 1946.
DOI : 10.1093/biomet/33.4.305

J. Snoek, H. Larochelle, and R. P. Adams, Practical bayesian optimization of machine learning algorithms, Advances in neural information processing systems, pp.2951-2959, 2012.

C. Thornton, F. Hutter, H. H. Hoos, and K. Leyton-brown, Auto-WEKA, Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '13, pp.847-855, 2013.
DOI : 10.1145/2487575.2487629

P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P. Manzagol, Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion, The Journal of Machine Learning Research, vol.11, pp.3371-3408, 2010.