D. Buntinas, G. Mercier, and W. Gropp, Design and evaluation of Nemesis, a scalable, low-latency, message-passing communication subsystem, Cluster Computing and the Grid Sixth IEEE International Symposium on, pp.10-20, 2006.
URL : https://hal.archives-ouvertes.fr/hal-00344350

W. Gropp, E. Lusk, N. Doss, and A. Skjellum, A high-performance, portable implementation of the MPI message passing interface standard, Parallel Computing, vol.22, issue.6, pp.789-828, 1996.
DOI : 10.1016/0167-8191(96)00024-5

E. Gabriel, G. E. Fagg, G. Bosilca, T. Angskun, J. J. Dongarra et al., Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation, Proceedings, 11th European PVM/MPI Users' Group Meeting, pp.97-104, 2004.
DOI : 10.1007/978-3-540-30218-6_19

L. Huse, Collective communication on dedicated clusters of workstations in Recent Advances in Parallel Virtual Machine and Message Passing Interface, ser. Lecture Notes in Computer Science, pp.682-682, 1999.

R. Graham and G. Shipman, Mpi support for multi-core architectures: Optimized shared memory collectives, " in Recent Advances in Parallel Virtual Machine and Message Passing Interface, ser. Lecture Notes in Computer Science, pp.130-140, 2008.

G. E. Fagg, G. Bosilca, J. Pje?ivac-grbovi´cgrbovi´c, T. Angskun, and J. Dongarra, Tuned: A flexible high performance collective communication component developed for open mpi, Proccedings of DAPSYS'06, pp.65-72, 2006.

P. Geoffray, L. Prylli, and B. Tourancheau, BIP-SMP, Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM) , Supercomputing '99, 1999.
DOI : 10.1145/331532.331552

R. Brightwell, Exploiting Direct Access Shared Memory for MPI On Multi-Core Processors, International Journal of High Performance Computing Applications, vol.24, issue.1, pp.69-77, 2010.
DOI : 10.1177/1094342009359014

H. Jin, S. Sur, L. Chai, and D. K. Panda, Lightweight kernel-level primitives for high-performance MPI intra-node communication over multi-core systems, 2007 IEEE International Conference on Cluster Computing, pp.446-451, 2007.
DOI : 10.1109/CLUSTR.2007.4629263

D. Buntinas, B. Goglin, D. Goodell, G. Mercier, and S. Moreaud, Cache-Efficient, Intranode, Large-Message MPI Communication with MPICH2-Nemesis, 2009 International Conference on Parallel Processing, pp.462-469, 2009.
DOI : 10.1109/ICPP.2009.22

URL : https://hal.archives-ouvertes.fr/inria-00390064

S. Moreaud, B. Goglin, D. Goodell, and R. Namyst, Optimizing MPI communication within large multicore nodes with kernel assistance, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010.
DOI : 10.1109/IPDPSW.2010.5470849

URL : https://hal.archives-ouvertes.fr/inria-00451471

F. Broquedis, J. C. Ortega, S. Moreaud, N. Furmento, B. Goglin et al., hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, 2010.
DOI : 10.1109/PDP.2010.67

URL : https://hal.archives-ouvertes.fr/inria-00429889

M. Chaarawi, J. M. Squyres, E. Gabriel, and S. Feki, A Tool for Optimizing Runtime Parameters of Open MPI, Proceedings , 15th European PVM/MPI Users' Group Meeting, pp.210-217, 2008.
DOI : 10.1007/BFb0056559

A. Plaat, H. E. Bal, R. F. Hofman, and T. Kielmann, Sensitivity of parallel applications to large differences in bandwidth and latency in two-layer interconnects, Proceedings Fifth International Symposium on High-Performance Computer Architecture, pp.769-782, 2001.
DOI : 10.1109/HPCA.1999.744376