F. Trahay, É. Brunet, and A. Denis, An analysis of the impact of multi-threading on communication performance, 2009 IEEE International Symposium on Parallel & Distributed Processing, 2009.
DOI : 10.1109/IPDPS.2009.5160893
URL : https://hal.archives-ouvertes.fr/inria-00381670

É. Brunet, F. Trahay, and A. Denis, A multicore-enabled multirail communication engine, 2008 IEEE International Conference on Cluster Computing, pp.316-321, 2008.
DOI : 10.1109/CLUSTR.2008.4663788
URL : https://hal.archives-ouvertes.fr/inria-00327158

M. Si, A. J. Peña, P. Balaji, M. Takagi, and Y. Ishikawa, MT-MPI, Proceedings of the 28th ACM international conference on Supercomputing, ICS '14
DOI : 10.1145/2597652.2597658

J. Sancho, K. Barker, D. Kerbyson, K. Davis, S. Potluri et al., Quantifying the potential benefit of overlapping communication and computation in largescale scientific applications Quantifying Performance Benefits of Overlap Using MPI-2 in a Seismic Modeling Application, Proceedings of the 2006 ACM/IEEE conference on Supercomputing Proceedings of the 24th ACM International Conference on Supercomputing, ser. ICS '10, pp.17-25, 2006.

G. Hager, G. Schubert, T. Schoenemeyer, and G. Wellein, Prospects for truly asynchronous communication with pure MPI and hybrid MPI/OpenMP on current supercomputing platforms

R. L. Graham, T. S. Woodall, and J. M. Squyres, Open MPI: A Flexible High Performance MPI, The 6th Annual International Conference on Parallel Processing and Applied Mathematics, 2005.
DOI : 10.1007/11752578_29

S. Sur, H. Jin, L. Chai, and D. Panda, RDMA read based rendezvous protocol for MPI over InfiniBand, Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming , PPoPP '06, pp.32-39, 2006.
DOI : 10.1145/1122971.1122978

M. J. Rashti and A. Afsahi, Improving Communication Progress and Overlap in MPI Rendezvous Protocol over RDMA-enabled Interconnects, 2008 22nd International Symposium on High Performance Computing Systems and Applications, pp.95-101, 2008.
DOI : 10.1109/HPCS.2008.10

M. Wittmann, G. Hager, T. Zeiser, and G. Wellein, Asynchronous MPI for the masses, 1302.

T. Hoefler and A. Lumsdaine, Message progression in parallel computing - to thread or not to thread?, 2008 IEEE International Conference on Cluster Computing, pp.213-222, 2008.
DOI : 10.1109/CLUSTR.2008.4663774

F. Trahay, A. Denis, O. Aumage, and R. Namyst, Improving Reactivity and Communication Overlap in MPI Using a Generic I/O Manager, EuroPVM/MPI, ser. LNCS Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp.170-177, 2007.
DOI : 10.1007/978-3-540-75416-9_27
URL : https://hal.archives-ouvertes.fr/inria-00177167

F. Trahay and A. Denis, A scalable and generic task scheduling system for communication libraries, 2009 IEEE International Conference on Cluster Computing and Workshops, 2009.
DOI : 10.1109/CLUSTR.2009.5289169
URL : https://hal.archives-ouvertes.fr/inria-00408521

. Mindcraft, Web and File Server Comparison: Microsoft Windows NT Server 4

M. Wilcox, I'll do it later: Softirqs, tasklets, bottom halves, task queues, work queues and timers, Linux.conf.au, 2003.

S. Dandamudi, Reducing run queue contention in shared memory multiprocessors, Computer, vol.30, issue.3, pp.82-89, 1997.
DOI : 10.1109/2.573673

F. Broquedis, J. Clet-ortega, S. Moreaud, N. Furmento, B. Goglin et al., hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, 2010.
DOI : 10.1109/PDP.2010.67
URL : https://hal.archives-ouvertes.fr/inria-00429889

S. Moreaud and B. Goglin, Impact of NUMA Effects on High-Speed Networking with Multi-Opteron Machines, PDCS, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00175747

J. D. Valois, Lock-free linked lists using compare-and-swap, Proceedings of the fourteenth annual ACM symposium on Principles of distributed computing , PODC '95, pp.214-222, 1995.
DOI : 10.1145/224964.224988

P. E. Mckenney and J. D. Slingwine, Read-Copy Update: Using Execution History to Solve Concurrency Problems, Parallel and Distributed Computing and Systems, pp.509-518, 1998.

A. Shet, P. Sadayappan, D. Bernholdt, J. Nieplocha, and V. Tipparaju, A framework for characterizing overlap of communication and computation in parallel applications, Cluster Computing, vol.20, issue.2, pp.75-90, 2008.
DOI : 10.1007/s10586-007-0046-3

D. K. Panda, OSU Micro-Benchmark Available: http: //mvapich.cse.ohio-state

R. Thakur and W. Gropp, Test suite for evaluating performance of multithreaded MPI communication, Parallel Computing, vol.35, issue.12, pp.608-617, 2009.
DOI : 10.1016/j.parco.2008.12.013

É. Brunet, O. Aumage, and R. Namyst, Dynamic optimization of communications over high speed networks, HPDC-15, The 15th IEEE International Symposium on High Performance Distributed Computing, 2006.
URL : https://hal.archives-ouvertes.fr/inria-00110773