MPI déroule l'intégralité du graphe de tâches, afin de découvrir les tâches qu'il doit effectuer, ainsi que lesémissionslesémissions / réceptions qu'il sera nécessaire de poster durant l'exécution. Cependant, ce déroulage est un facteur limitant pour la scalabilité ,
pruning " , i.e de découper le graphe de tâches, de sorte que chaque instance de StarPU-MPI n'ait connaissance que du sous-graphe de tâches qui lui est nécessaire ,
The flame approach : From dense linear algebra algorithms to high-performance multi-accelerator implementations, Journal of Parallel and Distributed Computing, issue.9, pp.721134-1143, 2012. ,
Qilin, Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, Micro-42, pp.45-55, 2009. ,
DOI : 10.1145/1669112.1669121
XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, 2013. ,
DOI : 10.1109/IPDPS.2013.66
URL : https://hal.archives-ouvertes.fr/hal-00799904
GMH: A Message Passing Toolkit for GPU Clusters, 2010 IEEE 16th International Conference on Parallel and Distributed Systems, pp.35-42, 2010. ,
DOI : 10.1109/ICPADS.2010.35
Implementation and Performance Evaluation of XcalableMP: A Parallel Programming Language for Distributed Memory Systems, 2010 39th International Conference on Parallel Processing Workshops, pp.413-420, 2010. ,
DOI : 10.1109/ICPPW.2010.62
An Extension of XcalableMP PGAS Lanaguage for Multi-node GPU Clusters, Euro-Par 2011 : Parallel Processing Workshops, pp.429-439, 2012. ,
DOI : 10.1007/978-3-642-29737-3_48
DAGuE: A generic distributed DAG engine for High Performance Computing, Parallel Computing, vol.38, issue.1-2, pp.37-51, 2012. ,
DOI : 10.1016/j.parco.2011.10.003
Programming heterogeneous clusters with accelerators using object-based programming, Scientific Programming, vol.19, issue.1, pp.47-62, 2011. ,
ParalleX An Advanced Parallel Execution Model for Scaling-Impaired Applications, 2009 International Conference on Parallel Processing Workshops, pp.394-401, 2009. ,
DOI : 10.1109/ICPPW.2009.14
Productive Programming of GPU Clusters with OmpSs, 2012 IEEE 26th International Parallel and Distributed Processing Symposium, pp.557-568, 2012. ,
DOI : 10.1109/IPDPS.2012.58
Scheduling Tasks over Multicore machines enhanced with acelerators : a Runtime System's Perspective, 2011. ,
Starpu : a unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation : Practice and Experience, pp.187-198, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00384363
StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators, Recent Advances in the Message Passing Interface, pp.298-299, 2012. ,
DOI : 10.1007/978-3-642-33518-1_40
URL : https://hal.archives-ouvertes.fr/hal-00725477
Stanimire Tomov, et al. Faster, cheaper, better?a hybridization methodology to develop linear algebra software for gpus, GPU Computing Gems, 2010. ,
Matrices over runtime systems at exascale, High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion, pp.1330-1331, 2012. ,
Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, Journal of Physics : Conference Series, p.12037, 2009. ,
DOI : 10.1088/1742-6596/180/1/012037
Quark users' guide : Queueing and runtime for kernels, 2011. ,
Pajé trace file format, Tech. rep, 2003. ,
An opensource tool-chain for performance analysis, Tools for High Performance Computing 2011, pp.37-48, 2012. ,
Mpi 2.2 standard -message passing interface forum [Online ; accessed 3, 2009. ,
Definitions and Detection of Deadlock, Livelock, and Starvation in Concurrent Programs, 1994 International Conference on Parallel Processing (ICPP'94), pp.69-72, 1994. ,
DOI : 10.1109/ICPP.1994.84
Synchronization without contention, ACM SIGPLAN Notices, vol.26, issue.4, pp.269-278, 1991. ,