, Periodic checkpointing in pytorch, 2018.

M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis et al., Tensorflow: A system for large-scale machine learning, 12th {USENIX} Symposium on Operating Systems Design and Implementation, pp.265-283, 2016.

O. Beaumont, L. Eyraud-dubois, J. Herrmann, A. Joly, and A. Shilova, Optimal checkpointing for heterogeneous chains: how to train deep neural networks with limited memory, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02352969

O. Beaumont, L. Eyraud-dubois, and A. Shilova, Optimal GPU-CPU Offloading Strategies for Deep Neural Network Training, Proceeding of EuroPar, p.2020, 2020.
URL : https://hal.archives-ouvertes.fr/hal-02316266

O. Beaumont, J. Herrmann, G. Pallez, and A. Shilova, Optimal Memory-aware Backpropagation of Deep Join Networks, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02401105

J. Boyar, L. Epstein, L. , and A. , Tight results for next fit and worst fit with resource augmentation, Theoretical Computer Science, vol.411, pp.2572-2580, 2010.

B. Chang, L. Meng, E. Haber, L. Ruthotto, D. Begert et al., Reversible architectures for arbitrarily deep residual neural networks, Thirty-Second AAAI Conference on Artificial Intelligence, 2018.

C. Chen, C. Yang, and H. Cheng, Efficient and robust parallel dnn training through model parallelism on multi-gpu platform, 2018.

T. Chen, B. Xu, C. Zhang, and C. Guestrin, Training deep nets with sublinear memory cost, 2016.

C. Chu, P. Kousha, A. A. Awan, K. S. Khorassani, H. Subramoni et al., Nv-group: link-efficient reduction for distributed deep learning on modern dense gpu systems, Proceedings of the 34th ACM International Conference on Supercomputing, pp.1-12, 2020.

J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin et al., Large scale distributed deep networks, Advances in neural information processing systems, pp.1223-1231, 2012.

N. Dryden, N. Maruyama, T. Benson, T. Moon, M. Snir et al., Improving strong-scaling of cnn training by exploiting finer-grained parallelism, IEEE International Parallel and Distributed Processing Symposium, 2019.

N. Dryden, N. Maruyama, T. Moon, T. Benson, M. Snir et al., Channel and filter parallelism for large-scale cnn training, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, p.10, 2019.

J. Feng and D. Huang, Optimal gradient checkpoint search for arbitrary computation graphs, 2018.

M. R. Garey, J. , and D. S. , Computers and intractability, vol.174, 1979.

X. Glorot and Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp.249-256, 2010.

A. N. Gomez, M. Ren, R. Urtasun, and R. B. Grosse, The reversible residual network: Backpropagation without storing activations, Advances in neural information processing systems, pp.2214-2224, 2017.

P. Goyal, P. Dollár, R. Girshick, P. Noordhuis, L. Wesolowski et al., Training imagenet in, vol.1

A. Gruslys, R. Munos, I. Danihelka, M. Lanctot, and A. Graves, Memory-efficient backpropagation through time, Advances in Neural Information Processing Systems, pp.4125-4133, 2016.

S. Han, H. Mao, and W. J. Dally, Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding, 2015.

R. Hemenway and . High, burst-mode optical interconnect for high performance computing systems, Conference on Lasers and Electro-Optics, vol.1, p.4, 2004.

A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang et al., Efficient convolutional neural networks for mobile vision applications, 2017.

Y. Huang, Y. Cheng, A. Bapna, O. Firat, D. Chen et al., Efficient training of giant neural networks using pipeline parallelism, Advances in Neural Information Processing Systems, pp.103-112, 2019.

I. Hubara, M. Courbariaux, D. Soudry, R. El-yaniv, and Y. Bengio, Quantized neural networks: Training neural networks with low precision weights and activations, The Journal of Machine Learning Research, vol.18, pp.6869-6898, 2017.

P. Jain, A. Jain, A. Nrusimha, A. Gholami, P. Abbeel et al., Checkmate: Breaking the memory wall with optimal tensor rematerialization, 2019.

N. Kukreja, A. Shilova, O. Beaumont, J. Huckelheim, N. Ferrier et al., Training on the edge: The why and the how, 1st Workshop on Parallel AI and Systems for the Edge, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02069728

R. Kumar, M. Purohit, Z. Svitkina, E. Vee, W. et al., Efficient rematerialization for deep networks, Advances in Neural Information Processing Systems, pp.15146-15155, 2019.

M. Kusumoto, T. Inoue, G. Watanabe, T. Akiba, and M. Koyama, A graph theoretic framework of recomputation algorithms for memory-efficient backpropagation, 2019.

J. Liu, Y. , W. Wu, J. Buntinas, D. Panda et al., Microbenchmark performance comparison of high-speed cluster interconnects, IEEE Micro, vol.24, issue.1, pp.42-51, 2004.

D. Narayanan, A. Harlap, A. Phanishayee, V. Seshadri, N. R. Devanur et al., PipeDream: generalized pipeline parallelism for DNN training, Proceedings of the 27th ACM Symposium on Operating Systems Principles, pp.1-15, 2019.

D. Narayanan, A. Phanishayee, K. Shi, X. Chen, and M. Zaharia, Memory-efficient pipeline-parallel dnn training, 2020.

S. Nickel, C. Steinhardt, H. Schlenker, W. Burkart, and M. Reuter-oppermann, Ibm ilog cplex optimization studio, Angewandte Optimierung mit IBM ILOG CPLEX Optimization Studio, pp.9-23, 2020.

T. Paine, H. Jin, J. Yang, Z. Lin, and T. Huang, Gpu asynchronous stochastic gradient descent to speed up neural network training, 2013.

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang et al., Automatic differentiation in pytorch, 2017.

M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, Xnor-net: Imagenet classification using binary convolutional neural networks, European Conference on Computer Vision, pp.525-542, 2016.

M. Rhu, N. Gimelshein, J. Clemons, A. Zulfiqar, S. W. Keckler et al., Virtualized deep neural networks for scalable, memory-efficient neural network design, The 49th Annual IEEE/ACM International Symposium on Microarchitecture, p.18, 2016.

S. S-b, A. Garg, and P. Kulkarni, Dynamic memory management for gpu-based training of deep neural networks, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2016.

J. Tarnawski, A. Phanishayee, N. R. Devanur, D. Mahajan, and F. N. Paravecino, Efficient algorithms for device placement of dnn graph operators, 2020.

Y. You, Z. Zhang, J. Demmel, K. Keutzer, and C. Hsieh,

J. Zhan and J. Zhang, Pipe-torch: Pipeline-based distributed deep learning in a gpu cluster with heterogeneous networking, Seventh International Conference on Advanced Cloud and Big Data (CBD), pp.55-60, 2019.

X. Zhang, X. Zhou, M. Lin, and J. Sun, Shufflenet: An extremely efficient convolutional neural network for mobile devices, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.6848-6856, 2018.

M. Zinkevich, M. Weimer, L. Li, and A. J. Smola, Parallelized stochastic gradient descent, Advances in neural information processing systems, pp.2595-2603, 2010.