C. Augonnet, S. Thibault, and R. Namyst, Automatic Calibration of Performance Models on Heterogeneous Multicore Architectures, Proceedings of the International Euro-Par Workshops, 2009.
DOI : 10.1007/978-3-642-14122-5_9
URL : https://hal.archives-ouvertes.fr/inria-00421333

C. Augonnet, S. Thibault, R. Namyst, and M. Nijhuis, Exploiting the Cell/BE Architecture with the StarPU Unified Runtime System, SAMOS Workshop, 2009.
DOI : 10.1007/978-3-642-03138-0_36
URL : https://hal.archives-ouvertes.fr/inria-00378705

C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier, StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures, Proceedings of the 15th Euro-Par Conference, 2009.
URL : https://hal.archives-ouvertes.fr/inria-00384363

E. Ayguade, R. M. Badia, D. Cabrera, A. Duran, M. Gonzalez et al., A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures, IWOMP '09: Proceedings of the 5th International Workshop on OpenMP, pp.154-167, 2009.
DOI : 10.1007/978-3-540-79561-2_10

E. Ayguadé, R. M. Badia, F. D. Igual, J. Labarta, R. Mayo et al., An Extension of the StarSs Programming Model for Platforms with Multiple GPUs, Proceedings of the 15th Euro-Par Conference, 2009.
DOI : 10.1109/TPDS.2003.1214317

P. Bellens, J. M. Pérez, F. Cabarcas, A. Ramírez, R. M. Badia et al., CellSs: Scheduling Techniques to Better Exploit Memory Hierarchy, Scientific Programming, pp.77-95, 2009.
DOI : 10.1155/2009/561672

C. H. Crawford, P. Henning, M. Kistler, and C. Wright, Accelerating computing with the cell broadband engine processor, Proceedings of the 2008 conference on Computing frontiers , CF '08, pp.3-12, 2008.
DOI : 10.1145/1366230.1366234

F. Gregory, S. Diamos, and . Yalamanchili, Harmony: an execution model and runtime for heterogeneous many core systems, HPDC '08: Proceedings of the 17th international symposium on High performance distributed computing, pp.197-200, 2008.

R. Dolbeau, S. Bihan, and F. Bodin, HMPP: A hybrid multi-core parallel programming environment, 2007.

K. Fatahalian, T. J. Knight, M. Houston, M. Erez, D. Reiter-horn et al., Sequoia: Programming the Memory Hierarchy, ACM/IEEE SC 2006 Conference (SC'06), 2006.
DOI : 10.1109/SC.2006.55

M. Frigo, C. E. Leiserson, and K. H. Randall, The implementation of the Cilk-5 multithreaded language, ACM SIGPLAN Notices, vol.33, issue.5, pp.212-223, 1998.
DOI : 10.1145/277652.277725

K. Naga, B. Govindaraju, Y. Lloyd, B. Dotsenko, J. Smith et al., High performance discrete fourier transforms on graphics processors, SC '08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, pp.1-12, 2008.

J. Víctor, L. Jiménez, I. Vilanova, M. Gelado, G. Gil et al., Predictive Runtime Code Scheduling for Heterogeneous Architectures, HiPEAC, pp.19-33, 2009.

Y. Li, J. Dongarra, and S. Tomov, A Note on Auto-tuning GEMM for GPUs, ICCS (1), pp.884-892, 2009.
DOI : 10.1007/978-3-642-01970-8_89

S. Moreaud and B. Goglin, Impact of NUMA Effects on High- Speed Networking with Multi-Opteron Machines, The 19th IASTED International Conference on Parallel and Distributed Computing and Systems, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00175747

M. Nijhuis, H. Bos, H. E. Bal, and C. Augonnet, Mapping and Synchronizing Streaming Applications on Cell Processors, HiPEAC, pp.216-230, 2009.
DOI : 10.1007/978-3-540-92990-1_17
URL : https://hal.archives-ouvertes.fr/inria-00445993

J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Krüger et al., A Survey of General-Purpose Computation on Graphics Hardware, Computer Graphics Forum, vol.7, issue.4, pp.80-113, 2007.
DOI : 10.1016/j.rti.2005.04.002

J. Planas, R. M. Badia, E. Ayguadé, and J. Labarta, Hierarchical Task-Based Programming With StarSs, International Journal of High Performance Computing Applications, vol.23, issue.3, p.284, 2009.
DOI : 10.1177/1094342009106195

G. Teodoro, R. Sachetto, O. Sertel, M. Gurcan, W. M. Jr et al., Coordinating the use of GPU and CPU for improving performance of compute intensive applications, 2009 IEEE International Conference on Cluster Computing and Workshops, 2009.
DOI : 10.1109/CLUSTR.2009.5289193

S. Tomov, J. Dongarra, and M. Baboulin, Towards dense linear algebra for hybrid GPU accelerated manycore systems, Parallel Computing, vol.36, issue.5-6, 2009.
DOI : 10.1016/j.parco.2009.12.005

H. Topcuoglu, S. Hariri, and M. Wu, Performance-effective and lowcomplexity task scheduling for heterogeneous computing. Parallel and Distributed Systems, IEEE Transactions on, vol.13, issue.3, pp.260-274, 2002.

V. Volkov and J. W. Demmel, Benchmarking GPUs to tune dense linear algebra, 2008 SC, International Conference for High Performance Computing, Networking, Storage and Analysis, pp.1-11, 2008.
DOI : 10.1109/SC.2008.5214359
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.218.3436

L. Wesolowski, An application programming interface for general purpose graphics processing units in an asynchronous runtime system. Master's thesis, 2008.

R. , C. Whaley, and J. Dongarra, Automatically Tuned Linear Algebra Software, Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999.

I. Centre-de-recherche, ?. Grenoble, and . Rhône-alpes, Europe -38334 Montbonnot Saint-Ismier Centre de recherche INRIA Lille ? Nord Europe : Parc Scientifique de la Haute Borne -40, avenue Halley -59650 Villeneuve d'Ascq Centre de recherche INRIA Nancy ? Grand Est : LORIA, Technopôle de Nancy-Brabois -Campus scientifique 615, rue du Jardin Botanique -BP 101 -54602 Villers-lès-Nancy Cedex Centre de recherche INRIA Paris ? Rocquencourt : Domaine de Voluceau -Rocquencourt -BP 105 -78153 Le Chesnay Cedex Centre de recherche INRIA Rennes ? Bretagne Atlantique : IRISA, Campus universitaire de Beaulieu -35042 Rennes Cedex Centre de recherche INRIA Saclay ? Île-de-France, des Vignes : 4, rue Jacques Monod -91893 Orsay Cedex Centre de recherche INRIA BP 105 -78153 Le Chesnay Cedex (France) ?tt??????????r????r, 2004.