Skip to Main content Skip to Navigation
New interface
Conference papers

Evaluation of two topology-aware heuristics on level-3 BLAS library for multi-GPU platforms

Abstract : Nowadays GPUs have dominated the market considering the computing/power metric and numerous research works have provided Basic Linear Algebra Subprograms implementations accelerated on GPUs. Several software libraries have been developed for exploiting performance of systems with accelerators, but the real performance may be far from the platform peak performance with multiple GPUs. This paper presents two runtime heuristics to gain in performance when task based programs are performed on heterogeneous architecture such as multi-GPU systems. The first is a topology-aware policy to takes into account the heterogeneity of the high speed links that interconnect GPUs. The second is an optimistic heuristic that favor communication between devices. These have been implemented in the XKBLAS library BLAS-3 library. We made experiments on a NVIDIA DGX-1 with up to 8 GPUs V100 on a set of Basic Linear Algebra Subroutines. Experimental results on kernels showed that XKBlas outperformed most implementations including the overhead of creation and scheduling of dynamic tasks.
Complete list of metadata
Contributor : Thierry Gautier Connect in order to contact the contributor
Submitted on : Sunday, October 3, 2021 - 5:49:32 PM
Last modification on : Tuesday, October 25, 2022 - 4:19:04 PM
Long-term archiving on: : Tuesday, January 4, 2022 - 6:16:17 PM


Files produced by the author(s)


  • HAL Id : hal-03363275, version 1


Thierry Gautier, Joao Vicente Ferreira Lima. Evaluation of two topology-aware heuristics on level-3 BLAS library for multi-GPU platforms. PAW-ATM 2021 - 4th Annual Parallel Applications Workshop, Alternatives To MPI+X, Nov 2021, Saint Louis, United States. pp.1-11. ⟨hal-03363275⟩



Record views


Files downloads