D. H. Ackley, G. E. Hinton, and T. J. Sejnowski, A Learning Algorithm for Boltzmann Machines*, Cognitive Science, vol.85, issue.1, pp.147-169, 1985.
DOI : 10.1207/s15516709cog0901_7

Y. Bengio and Y. Grandvalet, No unbiased estimator of the variance of k-fold cross-validation, Journal of Machine Learning Research, vol.5, pp.1089-1105, 2004.

Y. Bengio, P. Lamblin, V. Popovici, and H. Larochelle, Greedy layer-wise training of deep networks, Proc. of NIPS'07, pp.153-160, 2007.

Y. Bengio and Y. Lecun, Scaling learning algorithms towards ai', in Large-Scale Kernel Machines, 2007.

Y. Bengio, J. Louradour, R. Collobert, and J. Weston, Curriculum learning, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009.
DOI : 10.1145/1553374.1553380

T. G. Dietterich, Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms, Neural Computation, vol.6, issue.7, 1998.
DOI : 10.1007/BF00058655

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.129.2536

B. Efron and R. Tibshirani, An introduction to the bootstrap, of Monographs on Statistic and Applied Probability, 1993.
DOI : 10.1007/978-1-4899-4541-9

D. Erhan, P. Manzagol, Y. Bengio, S. Bengio, and P. Vincent, The difficulty of training deep architectures and the effect of unsupervised pre-training, Proc. of AISTATS'09, 2009.

T. Hastie, R. Tibshirani, and J. H. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2001.

G. E. Hinton, Training Products of Experts by Minimizing Contrastive Divergence, Neural Computation, vol.22, issue.8, pp.1771-1800, 2002.
DOI : 10.1162/089976600300015385

G. E. Hinton, S. Osindero, and Y. Teh, A fast learning algorithm for deep belief nets', Neural Conputation, pp.1527-1554, 2006.

G. E. Hinton and R. Salakhutdinov, Reducing the Dimensionality of Data with Neural Networks, Science, vol.313, issue.5786, pp.313-504, 2006.
DOI : 10.1126/science.1127647

H. Larochelle, Y. Bengio, J. Louradour, and P. Lamblin, Exploring strategies for training deep neural networks', Journal of Machine Learning Research, vol.10, pp.1-40, 2009.

H. Larochelle, D. Erhan, A. Courville, J. Bergstra, and Y. Bengio, An empirical evaluation of deep architectures on problems with many factors of variation, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.473-480, 2007.
DOI : 10.1145/1273496.1273556

N. L. , R. , and Y. Bengio, Representational power of restricted boltzmann machines and deep belief networks, Neural Computation, vol.20, issue.6, pp.1631-1649, 2008.

H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng, Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, p.77, 2009.
DOI : 10.1145/1553374.1553453

H. Paugam-moisy, Parallel neural computing based on network duplicating' , in Parallel Algorithms for Digital Image Processing, Computer Vision and Neural Networks, pp.305-340, 1993.

R. Tibshirani, G. Walther, and T. Hastie, Estimating the number of clusters in a data set via the gap statistic, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.63, issue.2, pp.411-423, 2000.
DOI : 10.1111/1467-9868.00293

M. Welling and G. E. Hinton, A New Learning Algorithm for Mean Field Boltzmann Machines, Proc. of the International Conference on Artificial Neural Networks (ICANN), 2002.
DOI : 10.1007/3-540-46084-5_57