G. Almási, P. Heidelberger, C. J. Archer, X. Martorell, C. C. Erway et al., Optimization of MPI Collective Communication on BlueGene/L Systems, Proceedings of the 19th Annual International Conference on Supercomputing, ICS '05, pp.253-262, 2005.

K. Coulomb, M. Faverge, J. Jazeix, O. Lagrasse, J. Marcoueille et al., Visual Trace Explorer, 2016.

J. C. De-kergommeaux and B. De-oliveira-stein, Pajé: An extensible environment for visualizing multi-threaded programs executions, Euro-Par 2000 Parallel Processing, pp.133-140, 2000.

A. Denis, pioman: a pthread-based Multithreaded Communication Engine, Euromicro International Conference on Parallel, Distributed and Network-based Processing, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01087775

S. Derradji, T. Palfer-sollier, J. P. Panziera, A. Poudes, and F. Wellenreiter, The BXI Interconnect architecture, High-Performance Inter-connects (HOTI). 2015 IEEE 23th Annual Symposium, 2015.

T. Hoefler, P. Gottschling, and A. Lumsdaine, Brief Announcement: Leveraging Non-blocking Collective Communication in High-performance Applications, Proceedings of the Twentieth Annual Symposium on Parallelism in Algorithms and Architectures, SPAA'08, pp.113-115, 2008.

T. Hoefler and A. Lumsdaine, Message Progression in Parallel Computing -To Thread or not to Thread?, Proceedings of the 2008 IEEE International Conference on Cluster Computing, 2008.

T. Hoefler and A. Lumsdaine, Optimizing non-blocking Collective Operations for InfiniBand, Proceedings of the 22nd IEEE International Parallel & Distributed Processing Symposium, CAC'08 Workshop, 2008.

T. Hoefler, A. Lumsdaine, and W. Rehm, Implementation and Performance Analysis of Non-Blocking Collective Operations for MPI, Proceedings of the 2007 International Conference on High Performance Computing, Networking, Storage and Analysis, vol.07, 2007.

, IMB-NBC benchmarks, Intel Coroporation, 2018.

P. Lai, P. Balaji, R. Thakur, and D. Panda, ProOnE: A General Purpose Protocol Onload Engine for Multi-and Many-Core Architectures, 2009.

T. Ma, G. Bosilca, A. Bouteiller, B. Goglin, J. M. Squyres et al., Kernel Assisted Collective Intranode MPI Communication Among Multi-core and Manycore CPUs, 40th International Conference on Parallel Processing (ICPP-2011), 2011.
URL : https://hal.archives-ouvertes.fr/inria-00602877

. Mpi-forum, MPI: A Message-Passing Interface Standard Version, 2012.

M. Pérache, H. Jourdren, and R. Namyst, MPC: A Unified Parallel Runtime for Clusters of NUMA Machines, the 14th International Euro-Par Conference, vol.5168, pp.78-88, 2008.

M. J. Rashti and A. Afsahi, Improving communication progress and overlap in MPI Rendezvous protocol over RDMA-enabled interconnects, High Performance Computing Systems and Applications, pp.95-101, 2008.

A. Ruhela, H. Subramoni, S. Chakraborty, M. Bayatpour, P. Kousha et al., Efficient asynchronous communication progress for MPI without dedicated resources, Proceedings of the 25th European MPI Users' Group Meeting, EuroMPI'18, 2018.

P. Sanders and J. L. Träff, Parallel prefix (scan) algorithms for mpi, Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp.49-57, 2006.

M. Si, A. Peña, P. Balaji, M. Takagi, and Y. Ishikawa, MT-MPI: multithreaded MPI for many-core environments, Proceedings of the International Conference on Supercomputing, 2014.

S. Sur, H. Jin, L. Chai, and D. Panda, RDMA read based rendezvous protocol for MPI over InfiniBand: design alternatives and benefits, Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, pp.32-39, 2006.

, Author Biographies

, he was a visiting researcher at Vrije Universiteit Amsterdam. Since 2004, he is conducting his research at Inria Bordeaux Sud-Ouest and at the LaBRI laboratory. He works in the domain of high-performance computing, network communications, and MPI. He is focusing currently on communication/computation overlap. After defending its Ph.D, Alexandre Denis is a research scientist at Inria. Alexandre Denis got his Master degree fromÉcole Normale Supérieure de Lyon in 2000 and his PhD in the domain of Grid Computing from University of Rennes in 2003, 2003.

, During his Ph.D., he started the MPC (MultiProcessor Computing) framework. He joined CEA in 2006 as a research engineer to extend the MPC framework and provide building blocks for high performance multithreaded applications. He received the "Habilitationà Diriger des Recherches" (French degree which accredits to supervise researches) from Versailles Saint Quentinen-Yvelines University in 2015. Since 2018, he leads an R&D group dealing with runtime systems, tools for HPC systems and framework for scientific computing codes. After passing his master degree on High Performance Computing at University of Versailles in 2015, Hugo Taboada obtains his Ph.D degree from the University of Bordeaux under the direction of Emmanuel Jeannot and Alexandre Denis, Emmanuel Jeannot is a senior research scientist at Inria. He is conducting his research at Inria Bordeaux Sud-Ouest and at the LaBRI laboratory since 2009. Emmanuel Jeannot got his PhD degree in computer science from Ecole Normale Supérieure de Lyon, 2000.