F. Trahay, E. Brunet, and A. Denis, An analysis of the impact of multi-threading on communication performance, 2009 IEEE International Symposium on Parallel & Distributed Processing, 2009.
DOI : 10.1109/IPDPS.2009.5160893
URL : https://hal.archives-ouvertes.fr/inria-00381670

F. Trahay, E. Brunet, A. Denis, and R. Namyst, A multithreaded communication engine for multicore architectures, 2008 IEEE International Symposium on Parallel and Distributed Processing, 2008.
DOI : 10.1109/IPDPS.2008.4536139
URL : https://hal.archives-ouvertes.fr/inria-00224999

R. L. Graham, T. S. Woodall, and J. M. Squyres, Open MPI: A Flexible High Performance MPI, The 6th Annual International Conference on Parallel Processing and Applied Mathematics, 2005.
DOI : 10.1007/11752578_29

P. Balaji, D. Buntinas, D. Goodell, W. Gropp, and R. Thakur, Toward Efficient Support for Multithreaded MPI Communication, Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp.120-129, 2008.
DOI : 10.1006/jpdc.2000.1674

S. Coll, E. Frachtenberg, F. Petrini, A. Hoisie, and L. Gurvits, Using multirail networks in high-performance clusters, Proceedings. 2001 IEEE International Conference on, pp.15-24, 2001.

O. Aumage, E. Brunet, N. Furmento, and R. Namyst, Newmadeleine: a fast communication scheduling engine for high performance networks, " in CAC 2007: Workshop on Communication Architecture for Clusters, held in conjunction with IPDPS, also available as LaBRI Report 1421-07 and INRIA RR-6085. [Online]. Available, 2007.

S. Moreaud and B. Goglin, Impact of numa effects on high-speed networking with multi-opteron machines, The 19th IASTED International Conference on Parallel and Distributed Computing and Systems, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00175747

J. Sancho, K. Barker, D. Kerbyson, and K. Davis, Quantifying the potential benefit of overlapping communication and computation in largescale scientific applications, Proceedings of the 2006 ACM/IEEE conference on Supercomputing, 2006.

M. Inc, Myrinet EXpress (MX): A High Performance, Low-level, Message-Passing Interface for Myrinet, 2003.

S. Sur, H. Jin, L. Chai, and D. Panda, RDMA read based rendezvous protocol for MPI over InfiniBand, Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming , PPoPP '06, pp.32-39, 2006.
DOI : 10.1145/1122971.1122978

A. Maccabe, W. Zhu, J. Otto, and R. Riesen, Experience in offloading protocol processing to a programmable NIC, Proceedings. IEEE International Conference on Cluster Computing, pp.67-74, 2002.
DOI : 10.1109/CLUSTR.2002.1137730

S. Dandamudi and S. Cheng, Performance impact of run queue organization and synchronization on large-scale NUMA multiprocessor systems, Journal of Systems Architecture, vol.43, issue.6-7, pp.491-512, 1997.
DOI : 10.1016/S1383-7621(96)00059-8

F. Trahay, A. Denis, O. Aumage, and R. Namyst, Improving reactivity and communication overlap in mpi using a generic i/o manager, " in EuroPVM/MPI, ser. Lecture Notes in Computer Science Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp.170-177, 2007.

J. Liu, B. Chandrasekaran, J. Wu, W. Jiang, S. Kini et al., Performance Comparison of MPI Implementations over InfiniBand, Myrinet and Quadrics, Proceedings of the 2003 ACM/IEEE conference on Supercomputing, SC '03, pp.58-58, 2003.
DOI : 10.1145/1048935.1050208

A. Shet, P. Sadayappan, D. Bernholdt, J. Nieplocha, and V. Tipparaju, A framework for characterizing overlap of communication and computation in parallel applications, Cluster Computing, vol.20, issue.2, pp.75-90, 2008.
DOI : 10.1007/s10586-007-0046-3