M. D. Mccool, K. Wadleigh, B. Henderson, L. , and H. , Performance Evaluation of GPUs Using the RapidMind Development Platform, Proceedings of the ACM/IEEE Conference on Supercomputing, 2006.

S. Lee, T. Johnson, and R. Eigenmann, Cetus ??? An Extensible Compiler Infrastructure for Source-to-Source Transformation, Proceedings of the International Workshop on Languages and Compilers for Parallel Computing, 2003.
DOI : 10.1007/978-3-540-24644-2_35

S. Ueng, M. Lathara, S. S. Baghsorkhi, and W. W. Hwu, CUDA-Lite: Reducing GPU Programming Complexity, Proceedings of International Workshop on Languages and Compilers for Parallel Computing, 2008.
DOI : 10.1007/978-3-540-89740-8_1

. Impact-research and . Group, The Parboil benchmark suite, 2007.

Q. Hou, K. Zhou, and B. Guo, BSGP: bulk-synchronous GPU programming, ACM Transaction on Graphics, vol.27, issue.3, 2008.

T. D. Han and T. S. Abdelrahman, CUDA, Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, GPGPU-2, 2009.
DOI : 10.1145/1513895.1513902

S. Ryoo, C. I. Rodrigues, S. S. Baghsorkhi, S. S. Stone, D. B. Kirk et al., Optimization principles and application performance evaluation of a multithreaded GPU using CUDA, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming , PPoPP '08, 2008.
DOI : 10.1145/1345206.1345220

S. Ryoo, C. I. Rodrigues, S. S. Stone, S. S. Baghsorkhi, S. Ueng et al., Program optimization space pruning for a multithreaded gpu, Proceedings of the sixth annual IEEE/ACM international symposium on Code generation and optimization , CGO '08, 2008.
DOI : 10.1145/1356058.1356084

I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian et al., Brook for GPUs, ACM Transactions on Graphics, vol.23, issue.3, pp.777-786, 2004.
DOI : 10.1145/1015706.1015800

J. A. Stratton, S. S. Stone, and W. W. Hwu, MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs, Proceedings of International Workshop on Languages and Compilers for Parallel Computing, 2008.
DOI : 10.1007/978-3-540-89740-8_2

S. Liao, Z. Du, G. Wu, and G. Lueh, Data and computation transformations for Brook streaming applications on multiprocessors, Proceedings of the 4th International Symposium on Code Generation and Optimization, 2006.

M. M. Baskaran, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev et al., A Compiler Framework for Optimization of Affine Loop Nests for GPGPU, Proceedings of the 22nd Annual International Conference on Supercomputing, 2008.

S. Lee, S. Min, and R. Eigenmann, OpenMP to GPGPU: a compiler framework for automatic translation and optimization Sh: A High-Level Metaprogramming Language for Modern GPUs, Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2004.