Performance Upper Bound Analysis and Optimization of SGEMM on Fermi and Kepler GPUs - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2013

Performance Upper Bound Analysis and Optimization of SGEMM on Fermi and Kepler GPUs

Junjie Lai
  • Fonction : Auteur
  • PersonId : 913983
André Seznec

Résumé

In this paper, we present an approach to estimate GPU applications' performance upper bound based on algorithm analysis and assembly code level benchmarking. As an example, we analyze the potential peak performance of SGEMM (Single-precision General Matrix Multiply) on Fermi (GF110) and Kepler (GK104) GPUs. We try to answer the question of how much optimization space is left for SGEMM and why. According to our analysis, the nature of Fermi (Kepler) instruction set and the limited issue throughput of the schedulers are the main limitation factors for SGEMM to approach the theoretical peak performance. The estimated upper-bound peak performance of SGEMM is around 82.5% of the theoretical peak performance on GTX580 Fermi GPU and 57.6% on GTX680 Kepler GPU. Guided by this analysis and using the native assembly language, on average, our SGEMM implementations achieve about 5% better performance than CUBLAS in CUDA 4.1 SDK for large matrices on GTX580. The achieved performance is around 90% of the estimated upper-bound per- formance of SGEMM on GTX580. On GTX680, the best performance we achieve is around 77.3% of the estimated performance upper bound. We also describe how to use native assembly language directly in the CUDA runtime source
Fichier principal
Vignette du fichier
112_Lai.pdf (552.46 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00789958 , version 1 (19-02-2013)

Identifiants

  • HAL Id : hal-00789958 , version 1

Citer

Junjie Lai, André Seznec. Performance Upper Bound Analysis and Optimization of SGEMM on Fermi and Kepler GPUs. CGO '13 - 2013 International Symposium on Code Generation and Optimization, Feb 2013, Shenzhen, China. ⟨hal-00789958⟩
9167 Consultations
14711 Téléchargements

Partager

Gmail Facebook X LinkedIn More