M. Technologies, NVIDIA GPUDirect Technology Accelerating GPU-based Systems, 2010.

C. Augonnet, J. Clet-ortega, S. Thibault, and R. Namyst, Data-Aware Task Scheduling on Multi-accelerator Based Platforms, 2010 IEEE 16th International Conference on Parallel and Distributed Systems, 2010.
DOI : 10.1109/ICPADS.2010.129

URL : https://hal.archives-ouvertes.fr/inria-00523937

E. Agullo, C. Augonnet, J. Dongarra, H. Ltaief, R. Namyst et al., A Hybridization Methodology for High-Performance Linear Algebra Software for GPUs, 2010.
DOI : 10.1016/B978-0-12-385963-1.00034-4

E. Agullo, C. Augonnet, J. Dongarra, H. Ltaief, R. Namyst et al., Dynamically scheduled Cholesky factorization on multicore architectures with GPU accelerators, Symposium on Application Accelerators in High Performance Computing (SAAHPC), 2010.
URL : https://hal.archives-ouvertes.fr/inria-00547616

E. Agullo, C. Augonnet, J. Dongarra, M. Faverge, J. Langou et al., LU factorization for accelerator-based systems, 2011 9th IEEE/ACS International Conference on Computer Systems and Applications (AICCSA), 2011.
DOI : 10.1109/AICCSA.2011.6126599

URL : https://hal.archives-ouvertes.fr/hal-00654193

E. Agullo, C. Augonnet, J. Dongarra, M. Faverge, H. Ltaief et al., QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators, 2011 IEEE International Parallel & Distributed Processing Symposium, 2011.
DOI : 10.1109/IPDPS.2011.90

URL : https://hal.archives-ouvertes.fr/inria-00547614

E. Anderson, Z. Bai, J. Dongarra, A. Greenbaum, A. Mckenney et al., Lapack: a portable linear algebra library for high-performance computers, 1990.

J. Chen, W. W. Iii, and W. Mao, GMH: A Message Passing Toolkit for GPU Clusters, 2010 IEEE 16th International Conference on Parallel and Distributed Systems, 2010.
DOI : 10.1109/ICPADS.2010.35

A. Barak, T. Ben-nun, E. Levy, and A. Shiloh, A package for OpenCL based heterogeneous computing on clusters with many GPU devices, 2010 IEEE International Conference On Cluster Computing Workshops and Posters (CLUSTER WORKSHOPS), 2010.
DOI : 10.1109/CLUSTERWKSP.2010.5613086

Y. Zheng, C. Iancu, P. H. Hargrove, S. J. Min, and K. Yelick, Extending Unified Parallel C for GPU Computing, SIAM Conference on Parallel Processing for Scientific Computing, 2010.

J. Lee, M. T. Tran, T. Odajima, T. Boku, and M. Sato, An Extension of XcalableMP PGAS Lanaguage for Multi-node GPU Clusters, In: HeteroPar, 2011.
DOI : 10.1007/978-3-642-29737-3_48

J. Bueno, J. Planas, A. Duran, X. Martorell, E. Ayguad et al., Productive Programming of GPU Clusters with OmpSs, 2012 IEEE 26th International Parallel and Distributed Processing Symposium, 2012.
DOI : 10.1109/IPDPS.2012.58

F. D. Igual, E. Chan, E. S. Quintana-ort, G. Quintana-ort, R. A. Van-de-geijn et al., The FLAME approach: From dense linear algebra algorithms to high-performance multi-accelerator implementations, Journal of Parallel and Distributed Computing, vol.72, issue.9, 2012.
DOI : 10.1016/j.jpdc.2011.10.014

G. Quintana-ort, F. D. Igual, E. S. Quintana-ort, and R. A. Van-de-geijn, Solving dense linear systems on platforms with multiple hardware accelerators, Symposium on Principles and Practice of Parallel Programming, 2009.

T. Heller, H. Kaiser, and K. Iglberger, Application of the ParalleX execution model to stencil-based problems, International Supercomputing Conference, 2012.
DOI : 10.1007/s00450-012-0217-1

H. Kaiser, M. Brodowicz, and T. Sterling, ParalleX An Advanced Parallel Execution Model for Scaling-Impaired Applications, 2009 International Conference on Parallel Processing Workshops, 2009.
DOI : 10.1109/ICPPW.2009.14

A. Tabbal, M. Anderson, M. Brodowicz, H. Kaiser, and T. Sterling, Preliminary design examination of the ParalleX system from a software and hardware perspective, ACM SIGMETRICS Performance Evaluation Review, vol.38, issue.4, 2011.
DOI : 10.1145/1964218.1964232

T. D. Hartley and E. Saule, atalyrek: Improving performance of adaptive component-based dataflow middleware, Parallel Computing, 2012.

C. K. Luk, S. Hong, and H. Kim, Qilin, Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, Micro-42, 2009.
DOI : 10.1145/1669112.1669121

G. Bosilca, A. Bouteiller, A. Danalis, T. Hrault, P. Lemarinier et al., DAGuE: A generic distributed dag engine for high performance computing, Parallel Computing, 2012.

F. Song, H. Ltaief, B. Hatem, and J. Dongarra, Scalable Tile Communication-Avoiding QR Factorization on Multicore Cluster Systems, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, 2010.
DOI : 10.1109/SC.2010.48

F. Song, A. Yarkhan, and J. Dongarra, Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, 2009.
DOI : 10.1145/1654059.1654079

D. Kunzman, Runtime support for object-based message-driven parallel applications on heterogeneous clusters, 2012.

D. M. Kunzman and L. V. Kal, Programming Heterogeneous Clusters with Accelerators Using Object-Based Programming, Scientific Programming, vol.19, issue.1, 2011.
DOI : 10.1155/2011/525717

G. Zheng, E. Meneses, A. Bhatele, and L. V. Kale, Hierarchical Load Balancing for Charm++ Applications on Large Supercomputers, 2010 39th International Conference on Parallel Processing Workshops, pp.2-2, 2010.
DOI : 10.1109/ICPPW.2010.65

T. Gautier, J. V. Lima, N. Maillard, and B. Raffin, Locality-aware work stealing on multi-cpu and multi-gpu architectures, Programmability Issues for Heterogeneous Multicores (MULTIPROG), 2013.
URL : https://hal.archives-ouvertes.fr/hal-00780890

E. Hermann, B. Raffin, F. Faure, T. Gautier, and J. Allard, Multi-gpu and multicpu parallelization for interactive physics simulations, The 16th international Euro-Par conference on Parallel processing: Part II, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00502448

C. Lauderdale and R. Khan, Towards a codelet-based runtime for exascale computing, Proceedings of the 2nd International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era, EXADAPT '12
DOI : 10.1145/2185475.2185478