A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, NIPS, 2012.
DOI : 10.1162/neco.2009.10-08-881
URL : http://dl.acm.org/ft_gateway.cfm?id=3065386&type=pdf

O. M. Parkhi, A. Vedaldi, and A. Zisserman, Deep Face Recognition, Procedings of the British Machine Vision Conference 2015, 2015.
DOI : 10.5244/C.29.41

T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, Distributed representations of words and phrases and their compositionality Available: http://papers.nips.cc/paper/ 5021-distributed-representations-of-words-and-phrases-and-their-compositionality, NIPS, 2013. [Online]

K. He, X. Zhang, S. Ren, and J. Sun, Deep Residual Learning for Image Recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.90

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh et al., ImageNet Large Scale Visual Recognition Challenge, International Journal of Computer Vision, vol.1010, issue.1, pp.211-252, 2015.
DOI : 10.1007/978-3-642-15555-0_11
URL : http://arxiv.org/pdf/1409.0575

M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis et al., Tensorflow: A system for large-scale machine learning, OSDI, 2016.

S. Shi, Q. Wang, P. Xu, and X. Chu, Benchmarking State-of-the-Art Deep Learning Software Tools, 2016 7th International Conference on Cloud Computing and Big Data (CCBD), p.7249, 1608.
DOI : 10.1109/CCBD.2016.029

J. Kone?n´kone?n´y, H. Brendan-mcmahan, F. X. Yu, P. Richtárik, A. Suresh et al., Federated Learning: Strategies for Improving Communication Efficiency, p.5492, 1610.

R. Shokri and V. Shmatikov, Privacy-preserving deep learning, CCS, 2015.
DOI : 10.1145/2810103.2813687

Y. Lecun, L. Bottou, G. Orr, and K. Muller, Efficient backprop, Neural Networks: Tricks of the trade, pp.9-50, 1998.

L. Bottou, Online algorithms and stochastic approximations, Online Learning and Neural Networks, 1998.

W. Zhang, S. Gupta, X. Lian, and J. Liu, Staleness-aware async-sgd for distributed deep learning

S. Gupta, W. Zhang, and F. Wang, Model accuracy and runtime tradeoff in distributed deep learning: A systematic study, ICDM, 2016.

X. Lian, Y. Huang, Y. Li, J. Liu, C. Cortes et al., Asynchronous parallel stochastic gradient for nonconvex optimization Available: http://papers.nips.cc/paper/ 5751-asynchronous-parallel-stochastic-gradient-for-nonconvex-optimization. pdf [18] A. Odena, NIPS, p.4033, 1601.

B. Recht, C. Re, S. Wright, and F. Niu, Hogwild: A lock-free approach to parallelizing stochastic gradient descent, NIPS, 2011.

Y. Lecun, C. Cortes, and C. J. Burges, The mnist database of handwritten digits, 1998.

F. Chollet, Keras, 2015.

C. Hardy, E. L. Merrer, and B. Sericola, Distributed deep learning on edge-devices: feasibility via adaptive compression
URL : https://hal.archives-ouvertes.fr/hal-01622580

D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning representations by back-propagating errors, Nature, vol.85, issue.6088, 1988.
DOI : 10.1038/323533a0

J. Duchi, E. Hazan, and Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, Journal of Machine Learning Research, vol.12, pp.2121-2159, 2011.

D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, 1412.

T. Chilimbi, Y. Suzue, J. Apacible, and K. Kalyanaraman, Project adam: Building an efficient and scalable deep learning training system, OSDI, 2014.

J. Chen, R. Monga, S. Bengio, and R. Jozefowicz, Revisiting distributed synchronous SGD Available: https://arxiv, International Conference on Learning Representations Workshop Track, p.981, 1604.

J. Konecn´ykonecn´y, B. Mcmahan, and D. Ramage, Federated optimization: Distributed optimization beyond the datacenter

H. B. Mcmahan, E. Moore, D. Ramage, and B. A. Arcas, Federated learning of deep networks using model averaging, 2016.