C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed et al., Going deeper with convolutions, CVPR, 2015.

K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, CVPR, 2016.

G. Huang, Z. Liu, L. Van-der-maaten, and K. Q. Weinberger, Densely connected convolutional networks, IEEE CVPR, 2017.

P. Goyal, P. Dollár, R. Girshick, P. Noordhuis, L. Wesolowski et al., Accurate, large minibatch sgd: Training imagenet in 1 hour, 2017.

Y. You, Z. Zhang, C. Hsieh, J. Demmel, and K. Keutzer, Imagenet training in minutes, 2017.

S. Bulò, L. Porzi, and P. Kontschieder, In-place activated batchnorm for memory-optimized training of dnns, Proceedings of CVPR, pp.5639-5647, 2018.

G. Pleiss, D. Chen, G. Huang, T. Li, L. Van-der-maaten et al., Memory-efficient implementation of densenets, 2017.

J. Carranza-rojas, H. Goeau, P. Bonnet, E. Mata-montero, and A. Joly, Going deeper in the automated identification of herbarium specimens, BMC Evolutionary Biology, vol.17, issue.1, p.181, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01580070

S. Ghadai, X. Lee, A. Balu, S. Sarkar, and A. Krishnamurthy, Multi-level 3d cnn for learning multi-scale spatial features, IEEE CVPR Workshops, pp.0-0, 2019.

Y. Feng, Z. Zhang, X. Zhao, R. Ji, and Y. Gao, Gvcnn: Group-view convolutional neural networks for 3d shape recognition, IEEE CVPR, pp.264-272, 2018.

H. Su, S. Maji, E. Kalogerakis, and E. Learned-miller, Multi-view convolutional neural networks for 3d shape recognition, IEEE ICCV, pp.945-953, 2015.

K. Hara, H. Kataoka, and Y. Satoh, Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet?, 2018.

Z. Shou, J. Chan, A. Zareian, K. Miyazawa, and S. Chang, Cdc: Convolutional-de-convolutional networks for precise temporal action localization in untrimmed videos, IEEE CVPR, pp.5734-5743, 2017.

S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, 2015.

S. Santurkar, D. Tsipras, A. Ilyas, and A. Madry, How does batch normalization help optimization, Advances in Neural Information Processing Systems, pp.2483-2493, 2018.

M. Kusumoto, T. Inoue, G. Watanabe, T. Akiba, and M. Koyama, A graph theoretic framework of recomputation algorithms for memory-efficient backpropagation, 2019.

M. Rhu, N. Gimelshein, J. Clemons, A. Zulfiqar, and S. W. Keckler, vdnn: Virtualized deep neural networks for scalable, memory-efficient neural network design, The 49th Annual IEEE/ACM International Symposium on Microarchitecture, p.18, 2016.

A. Garg and P. Kulkarni, Dynamic memory management for gpu-based training of deep neural networks, IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2019.

M. Zinkevich, M. Weimer, L. Li, and A. J. Smola, Parallelized stochastic gradient descent, Advances in neural information processing systems, pp.2595-2603, 2010.

T. Paine, H. Jin, J. Yang, Z. Lin, and T. Huang, Gpu asynchronous stochastic gradient descent to speed up neural network training, 2013.

J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin et al., Large scale distributed deep networks, Advances in neural information processing systems, pp.1223-1231, 2012.

A. Griewank, Mathematical Programming: recent developments and applications, vol.6, pp.83-107, 1989.

A. Adcroft, J. Campin, S. Dutkiewicz, C. Evangelinos, D. Ferreira et al., Mitgcm user manual, 2008.

P. Brubaker, Engineering Design Optimization using Calculus Level Methods, 2016.

A. Griewank and A. Walther, Algorithm 799: Revolve: an implementation of checkpointing for the reverse or adjoint mode of computational differentiation, ACM Transactions on Mathematical Software (TOMS), vol.26, issue.1, pp.19-45, 2000.

A. Gruslys, R. Munos, I. Danihelka, M. Lanctot, and A. Graves, Memory-efficient backpropagation through time, Advances in Neural Information Processing Systems, pp.4125-4133, 2016.

T. Chen, B. Xu, C. Zhang, and C. Guestrin, Training deep nets with sublinear memory cost, 2016.

N. Kukreja, J. Hückelheim, and G. J. Gorman, Backpropagation for long sequences: beyond memory constraints with constant overheads, 2018.

O. Beaumont, L. Eyraud-dubois, J. Herrmann, A. Joly, and A. Shilova, Optimal checkpointing for heterogeneous chains: how to train deep neural networks with limited memory, Inria Bordeaux Sud-Ouest, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02352969

R. Kumar, M. Purohit, Z. Svitkina, E. Vee, and J. Wang, Efficient rematerialization for deep networks, Advances in Neural Information Processing Systems, pp.15-146, 2019.

J. Feng and D. Huang, Optimal gradient checkpoint search for arbitrary computation graphs, 2018.

P. Jain, A. Jain, A. Nrusimha, A. Gholami, P. Abbeel et al., Checkmate: Breaking the memory wall with optimal tensor rematerialization, 2019.

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang et al., Automatic differentiation in pytorch, 2017.

, Periodic checkpointing in pytorch, 2018.

M. Rhu, M. O'connor, N. Chatterjee, J. Pool, Y. Kwon et al., Compressing dma engine: Leveraging activation sparsity for training deep neural networks, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp.78-91, 2018.

Y. Kwon and M. Rhu, Beyond the memory wall: A case for memory-centric hpc system for deep learning, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)

, IEEE, pp.148-161, 2018.

C. Meng, M. Sun, J. Yang, M. Qiu, and Y. Gu, Training deeper models by gpu memory optimization on tensorflow, Proc. of ML Systems Workshop in NIPS, 2017.

T. D. Le, H. Imai, Y. Negishi, and K. Kawachiya, Tflms: Large model support in tensorflow by graph rewriting, 2018.

J. Zhang, S. H. Yeung, Y. Shu, B. He, and W. Wang, Efficient memory management for gpu-based deep learning systems, 2019.

L. Wang, J. Ye, Y. Zhao, W. Wu, A. Li et al., Superneurons: Dynamic gpu memory management for training deep neural networks, SIGPLAN Not, vol.53, issue.1, pp.41-53, 2018.