, Periodic checkpointing in pytorch, 2018.
Tensorflow: A system for large-scale machine learning, 12th {USENIX} Symposium on Operating Systems Design and Implementation, pp.265-283, 2016. ,
Optimal checkpointing for heterogeneous chains: how to train deep neural networks with limited memory, 2019. ,
URL : https://hal.archives-ouvertes.fr/hal-02352969
Optimal GPU-CPU Offloading Strategies for Deep Neural Network Training, Proceeding of EuroPar, p.2020, 2020. ,
URL : https://hal.archives-ouvertes.fr/hal-02316266
Optimal Memory-aware Backpropagation of Deep Join Networks, 2019. ,
URL : https://hal.archives-ouvertes.fr/hal-02401105
Tight results for next fit and worst fit with resource augmentation, Theoretical Computer Science, vol.411, pp.2572-2580, 2010. ,
Reversible architectures for arbitrarily deep residual neural networks, Thirty-Second AAAI Conference on Artificial Intelligence, 2018. ,
, Efficient and robust parallel dnn training through model parallelism on multi-gpu platform, 2018.
Training deep nets with sublinear memory cost, 2016. ,
Nv-group: link-efficient reduction for distributed deep learning on modern dense gpu systems, Proceedings of the 34th ACM International Conference on Supercomputing, pp.1-12, 2020. ,
Large scale distributed deep networks, Advances in neural information processing systems, pp.1223-1231, 2012. ,
Improving strong-scaling of cnn training by exploiting finer-grained parallelism, IEEE International Parallel and Distributed Processing Symposium, 2019. ,
Channel and filter parallelism for large-scale cnn training, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, p.10, 2019. ,
Optimal gradient checkpoint search for arbitrary computation graphs, 2018. ,
, Computers and intractability, vol.174, 1979.
Understanding the difficulty of training deep feedforward neural networks, Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp.249-256, 2010. ,
The reversible residual network: Backpropagation without storing activations, Advances in neural information processing systems, pp.2214-2224, 2017. ,
, Training imagenet in, vol.1
Memory-efficient backpropagation through time, Advances in Neural Information Processing Systems, pp.4125-4133, 2016. ,
, Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding, 2015.
burst-mode optical interconnect for high performance computing systems, Conference on Lasers and Electro-Optics, vol.1, p.4, 2004. ,
, Efficient convolutional neural networks for mobile vision applications, 2017.
Efficient training of giant neural networks using pipeline parallelism, Advances in Neural Information Processing Systems, pp.103-112, 2019. ,
Quantized neural networks: Training neural networks with low precision weights and activations, The Journal of Machine Learning Research, vol.18, pp.6869-6898, 2017. ,
Checkmate: Breaking the memory wall with optimal tensor rematerialization, 2019. ,
Training on the edge: The why and the how, 1st Workshop on Parallel AI and Systems for the Edge, 2019. ,
URL : https://hal.archives-ouvertes.fr/hal-02069728
Efficient rematerialization for deep networks, Advances in Neural Information Processing Systems, pp.15146-15155, 2019. ,
A graph theoretic framework of recomputation algorithms for memory-efficient backpropagation, 2019. ,
Microbenchmark performance comparison of high-speed cluster interconnects, IEEE Micro, vol.24, issue.1, pp.42-51, 2004. ,
PipeDream: generalized pipeline parallelism for DNN training, Proceedings of the 27th ACM Symposium on Operating Systems Principles, pp.1-15, 2019. ,
Memory-efficient pipeline-parallel dnn training, 2020. ,
Ibm ilog cplex optimization studio, Angewandte Optimierung mit IBM ILOG CPLEX Optimization Studio, pp.9-23, 2020. ,
Gpu asynchronous stochastic gradient descent to speed up neural network training, 2013. ,
Automatic differentiation in pytorch, 2017. ,
Xnor-net: Imagenet classification using binary convolutional neural networks, European Conference on Computer Vision, pp.525-542, 2016. ,
Virtualized deep neural networks for scalable, memory-efficient neural network design, The 49th Annual IEEE/ACM International Symposium on Microarchitecture, p.18, 2016. ,
Dynamic memory management for gpu-based training of deep neural networks, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2016. ,
Efficient algorithms for device placement of dnn graph operators, 2020. ,
,
Pipe-torch: Pipeline-based distributed deep learning in a gpu cluster with heterogeneous networking, Seventh International Conference on Advanced Cloud and Big Data (CBD), pp.55-60, 2019. ,
Shufflenet: An extremely efficient convolutional neural network for mobile devices, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.6848-6856, 2018. ,
Parallelized stochastic gradient descent, Advances in neural information processing systems, pp.2595-2603, 2010. ,