Performance Upper Bound Analysis and Optimization of SGEMM on Fermi and Kepler GPUs

Junjie Lai; André Seznec

Communication Dans Un Congrès Année : 2013

Performance Upper Bound Analysis and Optimization of SGEMM on Fermi and Kepler GPUs

(1) , (1)

Junjie Lai

Fonction : Auteur
PersonId : 913983

Amdahl's Law is Forever

André Seznec

Fonction : Auteur
PersonId : 13729
IdHAL : andre-seznec
ORCID : 0000-0002-3058-6503
IdRef : 033236402

Amdahl's Law is Forever

Résumé

In this paper, we present an approach to estimate GPU applications' performance upper bound based on algorithm analysis and assembly code level benchmarking. As an example, we analyze the potential peak performance of SGEMM (Single-precision General Matrix Multiply) on Fermi (GF110) and Kepler (GK104) GPUs. We try to answer the question of how much optimization space is left for SGEMM and why. According to our analysis, the nature of Fermi (Kepler) instruction set and the limited issue throughput of the schedulers are the main limitation factors for SGEMM to approach the theoretical peak performance. The estimated upper-bound peak performance of SGEMM is around 82.5% of the theoretical peak performance on GTX580 Fermi GPU and 57.6% on GTX680 Kepler GPU. Guided by this analysis and using the native assembly language, on average, our SGEMM implementations achieve about 5% better performance than CUBLAS in CUDA 4.1 SDK for large matrices on GTX580. The achieved performance is around 90% of the estimated upper-bound per- formance of SGEMM on GTX580. On GTX680, the best performance we achieve is around 77.3% of the estimated performance upper bound. We also describe how to use native assembly language directly in the CUDA runtime source

Mots clés

Kepler GPU Fermi GPU SGEMM CUDA Performance Upper Bound Analysis

Domaines

Performance et fiabilité [cs.PF]

Fichier principal

112_Lai.pdf (552.46 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Junjie Lai : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00789958

Soumis le : mardi 19 février 2013-10:28:23

Dernière modification le : vendredi 24 mars 2023-14:52:56

Archivage à long terme le : dimanche 2 avril 2017-02:41:04

Dates et versions

hal-00789958 , version 1 (19-02-2013)

Identifiants

HAL Id : hal-00789958 , version 1

Citer

Junjie Lai, André Seznec. Performance Upper Bound Analysis and Optimization of SGEMM on Fermi and Kepler GPUs. CGO '13 - 2013 International Symposium on Code Generation and Optimization, Feb 2013, Shenzhen, China. ⟨hal-00789958⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM EC-PARIS UNIV-RENNES1 CNRS INRIA INSA-RENNES IRISA IRISA-D3 INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES ANR UR1-MATH-NUM

9167 Consultations

14711 Téléchargements

Performance Upper Bound Analysis and Optimization of SGEMM on Fermi and Kepler GPUs

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager