Bound the Peak Performance of SGEMM on GPU with software-controlled fast memory - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Rapport (Rapport De Recherche) Année : 2012

Bound the Peak Performance of SGEMM on GPU with software-controlled fast memory

Junjie Lai
  • Fonction : Auteur
  • PersonId : 913983
André Seznec

Résumé

In this paper, we studied the NVIDIA GPU architecture characteristics concerning the SGEMM routine and the potential peak performance of SGEMM on Fermi GPU. Guiding by the analysis, our SGEMM routine achieved about 11% (NN), 4.5% (TN), 3% (NT) and 9% (TT) better performance than cublas in CUDA 4.1 package for large matrices on GTX580 Fermi Card. We also described how to use native assembly language directly in the CUDA runtime source code.
Fichier principal
Vignette du fichier
techReport.pdf (694.45 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00686006 , version 1 (06-04-2012)
hal-00686006 , version 2 (10-04-2012)

Identifiants

  • HAL Id : hal-00686006 , version 2

Citer

Junjie Lai, André Seznec. Bound the Peak Performance of SGEMM on GPU with software-controlled fast memory. [Research Report] RR-7923, INRIA. 2012. ⟨hal-00686006v2⟩
277 Consultations
644 Téléchargements

Partager

Gmail Facebook X LinkedIn More