XKBlas: a High Performance Implementation of BLAS-3 Kernels on Multi-GPU Server

Thierry Gautier; Joao Vicente Ferreira Lima

doi:10.1109/PDP50117.2020.00008

Communication Dans Un Congrès Année : 2020

XKBlas: a High Performance Implementation of BLAS-3 Kernels on Multi-GPU Server

(1) , (2)

1
2

Thierry Gautier

Fonction : Auteur
PersonId : 2115
IdHAL : thierrygautierinrialpesfr
IdRef : 064810712

Algorithms and Software Architectures for Distributed and HPC Platforms

Joao Vicente Ferreira Lima

Fonction : Auteur
PersonId : 774978
IdRef : 188250409

Universidade Federal de Santa Maria = Federal University of Santa Maria [Santa Maria, RS, Brazil]

Résumé

In the last ten years, GPUs have dominated the market considering the computing/power metric and numerous research works have provided Basic Linear Algebra Subprograms implementations accelerated on GPUs. Several software libraries have been developed for exploiting performance of systems with accelerators, but the real performance may be far from the platform peak performance. This paper presents XKBlas that aims to improve performance of BLAS-3 kernels on multi-GPU systems. At low level, we model computation as a set of tasks accessing data on different resources. At high level, the API design favors non-blocking calls as uniform concept to overlap latency, even by fine grain computation. Unit benchmark of BLAS-3 kernels showed that XKBlas outperformed most implementations including the overhead of dynamic task's creation and scheduling. XKBlas outperformed BLAS implementations such as cuBLAS-XT, PaRSEC, BLASX and Chameleon/StarPU.

Domaines

Calcul parallèle, distribué et partagé [cs.DC]

Fichier principal

pdp2020.pdf (760.52 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Thierry Gautier : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-03121583

Soumis le : mardi 28 septembre 2021-10:21:26

Dernière modification le : jeudi 1 février 2024-10:05:59

Archivage à long terme le : mercredi 29 décembre 2021-18:15:44

Dates et versions

hal-03121583 , version 1 (28-09-2021)

Identifiants

HAL Id : hal-03121583 , version 1
DOI : 10.1109/PDP50117.2020.00008

Citer

Thierry Gautier, Joao Vicente Ferreira Lima. XKBlas: a High Performance Implementation of BLAS-3 Kernels on Multi-GPU Server. PDP 2020 - 28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, Mar 2020, Västerås, Sweden. pp.1-8, ⟨10.1109/PDP50117.2020.00008⟩. ⟨hal-03121583⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-LYON UNIV-RENNES1 CNRS INRIA UNIV-LYON1 IRISA INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UDL UR1-MATH-NUM INRIA-BRASIL

80 Consultations

300 Téléchargements

XKBlas: a High Performance Implementation of BLAS-3 Kernels on Multi-GPU Server

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager