Parallel programming with migratable objects: Charm++ in practice, SC'14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp.647-658, 2014. ,
Faster, Cheaper, Better -a Hybridization Methodology to Develop Linear Algebra Software for GPUs, GPU Computing Gems, vol.2, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00547847
Achieving High Performance on Supercomputers with a Sequential Task-based Programming Model, IEEE Transactions on Parallel and Distributed Systems, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01618526
StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures. CCPE -Concurrency and Computation: Practice and Experience, Special Issue: Euro-Par, pp.187-198, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00384363
NewMadeleine: a Fast Communication Scheduling Engine for High Performance Networks, Workshop on Communication Architecture for Clusters, 2007. ,
URL : https://hal.archives-ouvertes.fr/inria-00122723
Legion: Expressing locality and independence with logical regions, SC '12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp.1-11, 2012. ,
Dense linear algebra on distributed heterogeneous hardware with a symbolic dag approach. Scalable Computing and Communications: Theory and Practice pp, pp.699-735, 2013. ,
Scalability of the NewMadeleine Communication Library for Large Numbers of MPI Point-to-Point Requests, CCGrid 2019 -19th Annual IEEE/ACM International Symposium in Cluster, Cloud, and Grid Computing, 2019. ,
URL : https://hal.archives-ouvertes.fr/hal-02103700
Architecture-Aware Algorithms for Scalable Performance and Resilience on Heterogeneous Architectures, 2013. ,
Automatic multithreaded parallel program generation for message passing multiprocessors using parameterized task graphs, International Conference 'Parallel Computing, 2001. ,
URL : https://hal.archives-ouvertes.fr/inria-00100489
Parallex an advanced parallel execution model for scaling-impaired applications, 2009 International Conference on Parallel Processing Workshops, pp.394-401, 2009. ,
,
Performance analysis of mpi collective operations, Cluster Computing, vol.10, issue.2, pp.127-143, 2007. ,
Two-tree algorithms for full bandwidth broadcast, reduction and scan, Parallel Computing, vol.35, issue.12, pp.581-594, 2009. ,
,
A high-productivity task-based programming model for clusters. Concurrency and Computation: Practice and Experience, vol.24, pp.2421-2448, 2012. ,
Optimal broadcast for fully connected processor-node networks, Journal of Parallel and Distributed Computing, vol.68, issue.7, pp.887-901, 2008. ,
A survey of methods for collective communication optimization and tuning, 2016. ,