G. Aupy, J. Herrmann, P. Hovland, and Y. Robert, Optimal multistage algorithm for adjoint computation, SIAM Journal on Scientific Computing, vol.38, issue.3, pp.232-255, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01147155

B. Chang, L. Meng, E. Haber, L. Ruthotto, D. Begert et al., Reversible architectures for arbitrarily deep residual neural networks, Thirty-Second AAAI Conference on Artificial Intelligence, 2018.

T. Chen, B. Xu, C. Zhang, and C. Guestrin, Training deep nets with sublinear memory cost, 2016.

D. Das, S. Avancha, D. Mudigere, K. Vaidynathan, S. Sridharan et al., Distributed deep learning using synchronous stochastic gradient descent, 2016.

J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin et al., Large scale distributed deep networks, Advances in neural information processing systems, pp.1223-1231, 2012.

N. Dryden, N. Maruyama, T. Benson, T. Moon, M. Snir et al., Improving strongscaling of cnn training by exploiting finer-grained parallelism, IEEE International Parallel and Distributed Processing Symposium, 2019.

A. N. Gomez, M. Ren, R. Urtasun, and R. B. Grosse, The reversible residual network: Backpropagation without storing activations, Advances in neural information processing systems, pp.2214-2224, 2017.

A. Griewank, W. , and A. , Algorithm 799: Revolve: an implementation of checkpointing for the reverse or adjoint mode of computational differentiation, ACM Transactions on Mathematical Software (TOMS), vol.26, issue.1, pp.19-45, 2000.

A. Griewank, W. , and A. , Evaluating derivatives: principles and techniques of algorithmic differentiation, vol.105, 2008.

A. Griewank, On automatic differentiation, Mathematical Programming: Recent Developments and Applications, vol.6, issue.6, pp.83-107, 1989.

A. Gruslys, R. Munos, I. Danihelka, M. Lanctot, and A. Graves, Memory-efficient backpropagation through time, Advances in Neural Information Processing Systems, pp.4125-4133, 2016.