G. Diamos, The design and implementation ocelot's dynamic binary translator from ptx to multi-core x86, 2009.

M. Frigo and S. Johnson, FFTW: an adaptive software architecture for the FFT, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181), pp.1381-1384, 1998.
DOI : 10.1109/ICASSP.1998.681704

R. C. Dongarra, Automatically tuned linear algebra software, 1997.

J. H. Kelm, D. R. Johnson, M. R. Johnson, N. C. Crago, W. Tuohy et al., Rigel: An architecture and scalable programming interface for a 1000-core accelerator, Proceedings of the International Symposium on Computer Architecture, pp.140-151, 2009.

K. Opencl and W. Group, OpenCL Specification, 1.0 edition, 2008.

S. M. Kofsky, Achieving performance portability across parallel accelerator architectures, Center for Reliable and High-Performance Computing, 2010.

C. Lin, The portability of parallel programs across MIMD computers, 1992.

E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym, NVIDIA Tesla: A Unified Graphics and Computing Architecture, IEEE Micro, vol.28, issue.2, pp.39-55, 2008.
DOI : 10.1109/MM.2008.31

J. M. Moura, J. Johnson, R. W. Johnson, D. Padua, V. K. Prasanna et al., SPIRAL: Automatic implementation of signal processing algorithms, High Performance Embedded Computing (HPEC), 2000.

J. Nickolls, I. Buck, M. Garland, and K. Skadron, Scalable parallel programming with CUDA, Queue, vol.6, issue.2, 2008.
DOI : 10.1145/1365490.1365500

S. Ryoo, C. I. Rodrigues, S. S. Stone, S. S. Baghsorkhi, S. Ueng et al., Program optimization space pruning for a multithreaded gpu, Proceedings of the sixth annual IEEE/ACM international symposium on Code generation and optimization , CGO '08, pp.195-204, 2008.
DOI : 10.1145/1356058.1356084

L. Snyder, The design and development of ZPL, Proceedings of the third ACM SIGPLAN conference on History of programming languages , HOPL III, pp.8-9, 2007.
DOI : 10.1145/1238844.1238852

S. S. Stone, J. P. Haldar, S. C. Tsao, W. W. Hwu, B. P. Sutton et al., Accelerating advanced mri reconstructions on GPUs, J. Parallel Distrib. Comput, issue.10, pp.681307-1318, 2008.

J. A. Stratton, S. S. Stone, and W. W. Hwu, MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs, pp.16-30, 2008.
DOI : 10.1007/978-3-540-89740-8_2