Bound the Peak Performance of SGEMM on GPU with software-controlled fast memory

Junjie Lai; André Seznec

Rapport (Rapport De Recherche) Année : 2012

Bound the Peak Performance of SGEMM on GPU with software-controlled fast memory

(1) , (1)

Junjie Lai

Fonction : Auteur
PersonId : 913983

Amdahl's Law is Forever

André Seznec

Fonction : Auteur
PersonId : 13729
IdHAL : andre-seznec
ORCID : 0000-0002-3058-6503
IdRef : 033236402

Amdahl's Law is Forever

Résumé

In this paper, we studied the NVIDIA GPU architecture characteristics concerning the SGEMM routine and the potential peak performance of SGEMM on Fermi GPU. Guiding by the analysis, our SGEMM routine achieved about 11% (NN), 4.5% (TN), 3% (NT) and 9% (TT) better performance than cublas in CUDA 4.1 package for large matrices on GTX580 Fermi Card. We also described how to use native assembly language directly in the CUDA runtime source code.

Domaines

Architectures Matérielles [cs.AR]

Fichier principal

techReport.pdf (694.45 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Junjie Lai : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00686006

Soumis le : mardi 10 avril 2012-10:19:09

Dernière modification le : vendredi 24 mars 2023-14:52:55

Archivage à long terme le : vendredi 31 mars 2017-04:30:28

Dates et versions

hal-00686006 , version 1 (06-04-2012)

hal-00686006 , version 2 (10-04-2012)

Identifiants

HAL Id : hal-00686006 , version 2

Citer

Junjie Lai, André Seznec. Bound the Peak Performance of SGEMM on GPU with software-controlled fast memory. [Research Report] RR-7923, INRIA. 2012. ⟨hal-00686006v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM EC-PARIS UNIV-RENNES1 CNRS INRIA INSA-RENNES IRISA INRIA-RRRT IRISA-D3 INRIA2 LARA UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES ANR UR1-MATH-NUM

277 Consultations

644 Téléchargements

Bound the Peak Performance of SGEMM on GPU with software-controlled fast memory

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager