Y. Bengio, P. Simard, and P. Frasconi, Learning long-term dependencies with gradient descent is difficult, IEEE Transactions on Neural Networks, vol.5, issue.2, pp.157-166, 1994.

D. Clevert, T. Unterthiner, and S. Hochreiter, Fast and accurate deep network learning by exponential linear units (elus), 2015.

Y. Gal and Z. Ghahramani, A theoretically grounded application of dropout in recurrent neural networks, Advances in Neural Information Processing Systems, vol.29, pp.1019-1027, 2016.

F. M. Harper and J. A. Konstan, The movielens datasets: History and context, ACM Trans. Interact. Intell. Syst, vol.5, issue.4, 2015.

S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, 2015.

J. Kiefer and J. Wolfowitz, Stochastic estimation of the maximum of a regression function, Ann. Math. Statist, vol.23, issue.3, p.1952

D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, 2014.

G. Klambauer, T. Unterthiner, A. Mayr, and S. Hochreiter, Self-normalizing neural networks, 2017.

Y. Koren, The BellKor Solution to the Netflix Grand Prize, 2009.

Y. Koren, R. Bell, and C. Volinsky, Matrix factorization techniques for recommender systems, Computer, vol.42, pp.30-37, 2009.

A. Krizhevsky, Learning multiple layers of features from tiny images, 2012.

O. Kuchaiev and B. Ginsburg, Training Deep AutoEncoders for Collaborative Filtering, 2017.

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, vol.86, issue.11, pp.2278-2324, 1998.

D. Liang, R. G. Krishnan, M. D. Hoffman, and T. Jebara, Variational autoencoders for collaborative filtering, Proceedings of the 2018 World Wide Web Conference, WWW '18, pp.689-698, 2018.

T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, Distributed representations of words and phrases and their compositionality, Advances in Neural Information Processing Systems, vol.26, pp.3111-3119, 2013.

V. Nair and G. E. Hinton, Rectified linear units improve restricted boltzmann machines, Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML'10, pp.807-814, 2010.

M. Piotte and M. Chabbert, The Pragmatic Theory Solution to the Netflix Grand Prize, 2009.

B. Ramsundar and R. B. Zadeh, TensorFlow for Deep Learning: From Linear Regression to Reinforcement Learning, 2018.

R. Salakhutdinov, A. Mnih, and G. Hinton, Restricted Boltzmann machines for collaborative filtering, Proceedings of the International Conference on Machine Learning, vol.24, pp.791-798, 2007.

H. Shimodaira, Improving predictive inference under covariate shift by weighting the log-likelihood function, 2000.

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout: A simple way to prevent neural networks from overfitting, Journal of Machine Learning Research, vol.15, pp.1929-1958, 2014.

G. Takács, I. Pilászy, B. Németh, and D. Tikk, On the Gravity recommendation system, Proc. of KDD Cup Workshop at SIGKDD'07, 13th ACM Int. Conf. on Knowledge Discovery and Data Mining, pp.22-30, 2007.

T. Tieleman and G. Hinton, RMSprop Gradient Optimization

A. Toscher, M. Jahrer, and R. Bell, The Big Chaos Solution to the Netflix Grand Prize, 2009.

F. Wilcoxon, Individual Comparisons by Ranking Methods, pp.196-202, 1992.