An adaptive batch-orchestration algorithm for the heterogeneous gpu cluster environment in distributed deep learning system, IEEE International Conference on Big Data and Smart Computing, vol.18, pp.725-728, 2018. ,
Preprocessing for image classification by convolutional neural networks, RTEICT'16: 2016 IEEE International Conference on Recent Trends in Electronics, Information Communication Technology, pp.1778-1781, 2016. ,
Pre-and Post-processing in Machine Learning and Data Mining, pp.258-266, 2001. ,
, Horovod repository
Towards portable online prediction of network utilization using mpi-level monitoring, EuroPar'19 : 25th International European Conference on Parallel and Distributed Systems, pp.1-14, 2019. ,
URL : https://hal.archives-ouvertes.fr/hal-02184204
Deep residual learning for image recognition, CVPR'16: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp.770-778, 2016. ,
Communication efficient distributed machine learning with the parameter server, NIPS'14:Proceedings of the 27th International Conference on Neural Information Processing Systems, vol.1, pp.19-27, 2014. ,
Asynchronous decentralized parallel stochastic gradient descent, ICML'18: The 35th International Conference on Machine Learning, pp.3049-3058, 2018. ,
TensorFlow: Large-scale machine learning on heterogeneous systems, 2015, software available from tensorflow.org ,
Caffe: Convolutional architecture for fast feature embedding, ICM'14: The 22Nd ACM International Conference on Multimedia, pp.675-678, 2014. ,
, Torch: A scientific computing framework for luajit
Scaling neural machine translation, WMT'18: Proceedings of the Third Conference on Machine Translation: Research Papers, pp.1-9, 2018. ,
An introduction to computational networks and the computational network toolkit, 2014. ,
, Distributed deep learning on hadoop and spark clusters
Firecaffe: Near-linear acceleration of deep neural network training on compute clusters, CVPR'16: 2016 Conference on Computer Vision and Pattern Recognition, pp.2592-2600, 2016. ,
Keras, 2015. ,
Convergence analysis of distributed stochastic gradient descent with shuffling, Neurocomputing, vol.337, pp.46-57, 2017. ,
Performance, energy, and scalability analysis and improvement of parallel cancer deep learning candle benchmarks, ICPP'19: Proceedings of the 48th International Conference on Parallel Processing, vol.78, p.11, 2019. ,
Imagenet training in minutes, ICPP'18: Proceedings of the 47th International Conference on Parallel Processing, vol.1, pp.1-1, 2018. ,
Scalable distributed dnn training using tensorflow and cuda-aware mpi: Characterization, designs, and performance evaluation, CCGRID'19: 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp.498-507, 2019. ,
Characterizing deeplearning I/O workloads in tensorflow, PDSW-DISCS'18: IEEE/ACM 3rd International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems, 2018. ,
I/o characterization and performance evaluation of beegfs for deep learning, ICPP'19: Proceedings of the 48th International Conference on Parallel Processing, vol.80, pp.1-80, 2019. ,
Candle/supervisor: A workflow framework for machine learning applied to cancer research, BMC Bioinformatics, issue.19, 2018. ,
, Candle benchmarks
Deep residual learning for image recognition, CVPR'16: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp.770-778, 2016. ,
ImageNet: A Large-Scale Hierarchical Image Database, CVPR'09: Conference on Computer Vision and Pattern Recognition, pp.248-255, 2009. ,
Veloc: Towards high performance adaptive asynchronous checkpointing at large scale, IPDPS'19: The 2019 IEEE International Parallel and Distributed Processing Symposium, pp.911-920, 2019. ,
URL : https://hal.archives-ouvertes.fr/hal-02184203