Skip to Main content Skip to Navigation
Conference papers

Implementing a GPU Programming Model on a non-GPU Accelerator Architecture

Abstract : Parallel codes are written primarily for the purpose of performance. It is highly desirable that parallel codes be portable between parallel architectures without significant performance degradation or code rewrites. While performance portability and its limits have been studied thoroughly on single processor systems, this goal has been less extensively studied and is more difficult to achieve for parallel systems. Emerging single-chip parallel platforms are no exception; writing code that obtains good performance across GPUs and other many-core CMPs can be challenging. In this paper, we focus on CUDA codes, noting that programs must obey a number of constraints to achieve high performance on an NVIDIA GPU. Under such constraints, we develop optimizations that improve the performance of CUDA code on a MIMD accelerator architecture that we are developing called Rigel. We demonstrate performance improvements with these optimizations over na¨ıve translations, and final performance results comparable to those of codes that were hand-optimized for Rigel.
Complete list of metadata

Cited literature [14 references]  Display  Hide  Download
Contributor : Ist Rennes Connect in order to contact the contributor
Submitted on : Monday, June 21, 2010 - 3:44:35 PM
Last modification on : Thursday, August 1, 2019 - 2:12:06 PM
Long-term archiving on: : Wednesday, September 22, 2010 - 6:11:22 PM


Files produced by the author(s)


  • HAL Id : inria-00493905, version 1



Stephen M. Kofsky, Daniel R. Johnson, John A. Stratton, Wen-Mei W. Hwu, Sanjay J. Patel, et al.. Implementing a GPU Programming Model on a non-GPU Accelerator Architecture. A4MMC 2010 - 1st Workshop on Applications for Multi and Many Core Processors, Jun 2010, Saint Malo, France. ⟨inria-00493905⟩



Record views


Files downloads