C. Actuellement and . De-starpu, MPI déroule l'intégralité du graphe de tâches, afin de découvrir les tâches qu'il doit effectuer, ainsi que lesémissionslesémissions / réceptions qu'il sera nécessaire de poster durant l'exécution. Cependant, ce déroulage est un facteur limitant pour la scalabilité

. Une-idée-serait-de-faire-du, pruning " , i.e de découper le graphe de tâches, de sorte que chaque instance de StarPU-MPI n'ait connaissance que du sous-graphe de tâches qui lui est nécessaire

B. Francisco, D. Igual, E. Chan, E. S. Quintana-ortí, G. Quintana-ortí et al., The flame approach : From dense linear algebra algorithms to high-performance multi-accelerator implementations, Journal of Parallel and Distributed Computing, issue.9, pp.721134-1143, 2012.

C. Luk, S. Hong, and H. Kim, Qilin, Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, Micro-42, pp.45-55, 2009.
DOI : 10.1145/1669112.1669121

T. Gautier, J. V. , F. Lima, N. Maillard, and . Bruno-raffin, XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, 2013.
DOI : 10.1109/IPDPS.2013.66

URL : https://hal.archives-ouvertes.fr/hal-00799904

J. Chen, W. Watson, and . Mao, GMH: A Message Passing Toolkit for GPU Clusters, 2010 IEEE 16th International Conference on Parallel and Distributed Systems, pp.35-42, 2010.
DOI : 10.1109/ICPADS.2010.35

J. Lee and M. Sato, Implementation and Performance Evaluation of XcalableMP: A Parallel Programming Language for Distributed Memory Systems, 2010 39th International Conference on Parallel Processing Workshops, pp.413-420, 2010.
DOI : 10.1109/ICPPW.2010.62

J. Lee, M. T. Tran, T. Odajima, T. Boku, and M. Sato, An Extension of XcalableMP PGAS Lanaguage for Multi-node GPU Clusters, Euro-Par 2011 : Parallel Processing Workshops, pp.429-439, 2012.
DOI : 10.1007/978-3-642-29737-3_48

G. Bosilca, A. Bouteiller, A. Danalis, T. Herault, P. Lemarinier et al., DAGuE: A generic distributed DAG engine for High Performance Computing, Parallel Computing, vol.38, issue.1-2, pp.37-51, 2012.
DOI : 10.1016/j.parco.2011.10.003

M. David, . Kunzman, V. Laxmikant, and . Kalé, Programming heterogeneous clusters with accelerators using object-based programming, Scientific Programming, vol.19, issue.1, pp.47-62, 2011.

H. Kaiser, M. Brodowicz, and T. Sterling, ParalleX An Advanced Parallel Execution Model for Scaling-Impaired Applications, 2009 International Conference on Parallel Processing Workshops, pp.394-401, 2009.
DOI : 10.1109/ICPPW.2009.14

J. Bueno, J. Planas, A. Duran, M. Rosa, X. Badia et al., Productive Programming of GPU Clusters with OmpSs, 2012 IEEE 26th International Parallel and Distributed Processing Symposium, pp.557-568, 2012.
DOI : 10.1109/IPDPS.2012.58

C. Augonnet, Scheduling Tasks over Multicore machines enhanced with acelerators : a Runtime System's Perspective, 2011.

C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier, Starpu : a unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation : Practice and Experience, pp.187-198, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00384363

C. Augonnet, O. Aumage, N. Furmento, R. Namyst, and S. Thibault, StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators, Recent Advances in the Message Passing Interface, pp.298-299, 2012.
DOI : 10.1007/978-3-642-33518-1_40

URL : https://hal.archives-ouvertes.fr/hal-00725477

E. Agullo, C. Augonnet, J. Dongarra, H. Ltaief, R. Namyst et al., Stanimire Tomov, et al. Faster, cheaper, better?a hybridization methodology to develop linear algebra software for gpus, GPU Computing Gems, 2010.

E. Agullo, G. Bosilca, B. Bramas, C. Castagnede, O. Coulaud et al., Matrices over runtime systems at exascale, High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion, pp.1330-1331, 2012.

E. Agullo, J. Demmel, J. Dongarra, B. Hadri, J. Kurzak et al., Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, Journal of Physics : Conference Series, p.12037, 2009.
DOI : 10.1088/1742-6596/180/1/012037

A. Yarkhan, J. Kurzak, and J. Dongarra, Quark users' guide : Queueing and runtime for kernels, 2011.

B. De, O. Stein, J. Chassin-de-kergommeaux, and G. Mounié, Pajé trace file format, Tech. rep, 2003.

K. Coulomb, A. Degomme, M. Faverge, and F. Trahay, An opensource tool-chain for performance analysis, Tools for High Performance Computing 2011, pp.37-48, 2012.

M. Forum, Mpi 2.2 standard -message passing interface forum [Online ; accessed 3, 2009.

K. Tai, Definitions and Detection of Deadlock, Livelock, and Starvation in Concurrent Programs, 1994 International Conference on Parallel Processing (ICPP'94), pp.69-72, 1994.
DOI : 10.1109/ICPP.1994.84

M. John, . Mellor-crummey, L. Michael, and . Scott, Synchronization without contention, ACM SIGPLAN Notices, vol.26, issue.4, pp.269-278, 1991.