Skip to Main content Skip to Navigation
Reports

Bound the Peak Performance of SGEMM on GPU with software-controlled fast memory

Junjie Lai 1 André Seznec 1
1 ALF - Amdahl's Law is Forever
Inria Rennes – Bretagne Atlantique , IRISA-D3 - ARCHITECTURE
Abstract : In this paper, we studied the NVIDIA GPU architecture characteristics concerning the SGEMM routine and the potential peak performance of SGEMM on Fermi GPU. Guiding by the analysis, our SGEMM routine achieved about 11% (NN), 4.5% (TN), 3% (NT) and 9% (TT) better performance than cublas in CUDA 4.1 package for large matrices on GTX580 Fermi Card. We also described how to use native assembly language directly in the CUDA runtime source code.
Document type :
Reports
Complete list of metadata

https://hal.inria.fr/hal-00686006
Contributor : Junjie Lai <>
Submitted on : Friday, April 6, 2012 - 4:16:54 PM
Last modification on : Thursday, November 15, 2018 - 11:57:43 AM
Long-term archiving on: : Wednesday, December 14, 2016 - 8:32:58 PM

File

techReport.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00686006, version 1

Citation

Junjie Lai, André Seznec. Bound the Peak Performance of SGEMM on GPU with software-controlled fast memory. [Research Report] RR-7923, 2012. ⟨hal-00686006v1⟩

Share

Metrics

Record views

40

Files downloads

211