The impact of cache misses on the performance of matrix product algorithms on multicore platforms

Mathias Jacquelin; Loris Marchal; Yves Robert

Rapport (Rapport De Recherche) Année : 2010

The impact of cache misses on the performance of matrix product algorithms on multicore platforms

(1) , (1) , (1)

Mathias Jacquelin

Fonction : Auteur
PersonId : 860090

Algorithms and Scheduling for Distributed Heterogeneous Platforms

Loris Marchal

Fonction : Auteur correspondant
PersonId : 170697
IdHAL : loris-marchal
ORCID : 0000-0002-5519-9913
IdRef : 112112986

Connectez-vous pour contacter l'auteur

Algorithms and Scheduling for Distributed Heterogeneous Platforms

Yves Robert

Fonction : Auteur
PersonId : 739318
IdHAL : yves-robert
ORCID : 0000-0003-2361-055X
IdRef : 029813611

Algorithms and Scheduling for Distributed Heterogeneous Platforms

Résumé

The multicore revolution is underway, bringing new chips introducing more complex memory architectures. Classical algorithms must be revisited in order to take the hierarchical memory layout into account. In this paper, we aim at designing cache-aware algorithms that minimize the number of cache misses paid during the execution of the matrix product kernel on a multicore processor. We analytically show how to achieve the best possible tradeoff between shared and distributed caches. We implement and evaluate several algorithms on two multicore platforms, one equipped with one Xeon quadcore, and the second one enriched with a GPU. It turns out that the impact of cache misses is very different across both platforms, and we identify what are the main design parameters that lead to peak performance for each target hardware configuration.

La révolution multi-coeur est en cours, qui voit l'arrivée de processeurs dotées d'une architecture mémoire complexe. Les algorithmes les plus classiques doivent être revisités pour prendre en compte la disposition hiérarchique de la mémoire. Dans ce rapport, nous étudions des algorithmes prenant en compte les caches de données qui minimisent le nombre de défauts de cache pendant l'exécution d'un produit de matrices sur un processeur multi-coeur. Nous montrons analytiquement comment obtenir le meilleur compromis entre les caches partagés et distribués. Nous proposons une implémentation pour évaluer ces algorithmes sur deux plates-formes multi-coeur, l'une équipé d'un processeur Xeon quadri-coeur, l'autre dotée d'un GPU. Il apparaît que l'impact des défauts de cache est très différent sur ces deux plates-formes, et nous identifions quels sont les principaux paramètres de conception qui conduisent aux performances maximales pour chacune de ces configurations matérielles.

Mots clés

Multicore platform Matrix product Cache misses Cache-aware algorithms

Domaines

Calcul parallèle, distribué et partagé [cs.DC]

Fichier principal

RR-7456.pdf (466.01 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Loris Marchal : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00537822

Soumis le : vendredi 19 novembre 2010-14:32:06

Dernière modification le : jeudi 15 février 2024-03:31:33

Archivage à long terme le : vendredi 26 octobre 2012-16:02:11

Dates et versions

inria-00537822 , version 1 (19-11-2010)

Identifiants

HAL Id : inria-00537822 , version 1

Citer

Mathias Jacquelin, Loris Marchal, Yves Robert. The impact of cache misses on the performance of matrix product algorithms on multicore platforms. [Research Report] RR-7456, INRIA. 2010, pp.32. ⟨inria-00537822⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-LYON UNIV-RENNES1 CNRS INRIA UNIV-LYON1 IRISA INRIA-RRRT INRIA2 LARA UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UDL UR1-MATH-NUM

125 Consultations

332 Téléchargements

The impact of cache misses on the performance of matrix product algorithms on multicore platforms

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager