An analysis of the impact of multi-threading on communication performance, 2009 IEEE International Symposium on Parallel & Distributed Processing, 2009. ,
DOI : 10.1109/IPDPS.2009.5160893
URL : https://hal.archives-ouvertes.fr/inria-00381670
A multithreaded communication engine for multicore architectures, 2008 IEEE International Symposium on Parallel and Distributed Processing, 2008. ,
DOI : 10.1109/IPDPS.2008.4536139
URL : https://hal.archives-ouvertes.fr/inria-00224999
Open MPI: A Flexible High Performance MPI, The 6th Annual International Conference on Parallel Processing and Applied Mathematics, 2005. ,
DOI : 10.1007/11752578_29
Toward Efficient Support for Multithreaded MPI Communication, Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp.120-129, 2008. ,
DOI : 10.1006/jpdc.2000.1674
Using multirail networks in high-performance clusters, Proceedings. 2001 IEEE International Conference on, pp.15-24, 2001. ,
Newmadeleine: a fast communication scheduling engine for high performance networks, " in CAC 2007: Workshop on Communication Architecture for Clusters, held in conjunction with IPDPS, also available as LaBRI Report 1421-07 and INRIA RR-6085. [Online]. Available, 2007. ,
Impact of numa effects on high-speed networking with multi-opteron machines, The 19th IASTED International Conference on Parallel and Distributed Computing and Systems, 2007. ,
URL : https://hal.archives-ouvertes.fr/inria-00175747
Quantifying the potential benefit of overlapping communication and computation in largescale scientific applications, Proceedings of the 2006 ACM/IEEE conference on Supercomputing, 2006. ,
Myrinet EXpress (MX): A High Performance, Low-level, Message-Passing Interface for Myrinet, 2003. ,
RDMA read based rendezvous protocol for MPI over InfiniBand, Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming , PPoPP '06, pp.32-39, 2006. ,
DOI : 10.1145/1122971.1122978
Experience in offloading protocol processing to a programmable NIC, Proceedings. IEEE International Conference on Cluster Computing, pp.67-74, 2002. ,
DOI : 10.1109/CLUSTR.2002.1137730
Performance impact of run queue organization and synchronization on large-scale NUMA multiprocessor systems, Journal of Systems Architecture, vol.43, issue.6-7, pp.491-512, 1997. ,
DOI : 10.1016/S1383-7621(96)00059-8
Improving reactivity and communication overlap in mpi using a generic i/o manager, " in EuroPVM/MPI, ser. Lecture Notes in Computer Science Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp.170-177, 2007. ,
Performance Comparison of MPI Implementations over InfiniBand, Myrinet and Quadrics, Proceedings of the 2003 ACM/IEEE conference on Supercomputing, SC '03, pp.58-58, 2003. ,
DOI : 10.1145/1048935.1050208
A framework for characterizing overlap of communication and computation in parallel applications, Cluster Computing, vol.20, issue.2, pp.75-90, 2008. ,
DOI : 10.1007/s10586-007-0046-3