Using Static Allocation Algorithms for Matrix Matrix Multiplication on Multicores and GPUs - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2018

Using Static Allocation Algorithms for Matrix Matrix Multiplication on Multicores and GPUs

Résumé

We consider the problem of data allocation when performing matrix multiplication on a heterogeneous node, with multicores and GPUs. Classical (cyclic) allocations designed for homogeneous settings are not appropriate, but the advent of task-based runtime systems makes it possible to use more general allocations. Previous theoretical work has proposed square and cube partitioning algorithms aimed at minimizing data movement for matrix multiplication. We propose techniques to adapt these continuous square partitionings to allocating discrete tiles of a matrix, and strategies to adapt the static allocation at run-time. We use these techniques in an implementation of Matrix Multiplication based on the StarPU runtime system, and we show through extensive experiments that this implementation allows to consistently obtain a lower communication volume while improving slightly the execution time, compared to standard state-of-the-art dynamic strategies.
Fichier principal
Vignette du fichier
icpp.pdf (747.23 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01670678 , version 1 (22-12-2017)
hal-01670678 , version 2 (31-05-2018)

Identifiants

Citer

Lionel Eyraud-Dubois, Thomas Lambert. Using Static Allocation Algorithms for Matrix Matrix Multiplication on Multicores and GPUs. ICPP 2018 - 47th International Conference on Parallel Processing, Aug 2018, Eugene, OR, United States. ⟨10.1145/3225058.3225066⟩. ⟨hal-01670678v2⟩
277 Consultations
448 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More