E. Gabriel, G. E. Fagg, G. Bosilca, T. Angskun, J. J. Dongarra et al., Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation, Proceedings, 11th European PVM/MPI Users' Group Meeting, pp.97-104, 2004.
DOI : 10.1007/978-3-540-30218-6_19

D. Buntinas, G. Mercier, and W. Gropp, Design and evaluation of Nemesis, a scalable, low-latency, message-passing communication subsystem, Cluster Computing and the Grid Sixth IEEE International Symposium on, pp.10-20, 2006.
URL : https://hal.archives-ouvertes.fr/hal-00344350

W. Gropp, E. Lusk, N. Doss, and A. Skjellum, A high-performance, portable implementation of the MPI message passing interface standard, Parallel Computing, vol.22, issue.6, pp.789-828, 1996.
DOI : 10.1016/0167-8191(96)00024-5

D. Buntinas, B. Goglin, D. Goodell, G. Mercier, and S. Moreaud, Cache-Efficient, Intranode, Large-Message MPI Communication with MPICH2-Nemesis, 2009 International Conference on Parallel Processing, pp.462-469, 2009.
DOI : 10.1109/ICPP.2009.22
URL : https://hal.archives-ouvertes.fr/inria-00390064

H. Jin, S. Sur, L. Chai, and D. Panda, LiMIC: support for high-performance MPI intra-node communication on linux cluster, International Conference on Parallel Processing, pp.184-191, 2005.

T. Ma, G. Bosilca, A. Bouteiller, and J. J. Dongarra, Locality and Topology Aware Intra-node Communication among Multicore CPUs, Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface, ser. EuroMPI'10, pp.265-274, 2010.
DOI : 10.1007/978-3-642-15646-5_28
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.180.4377

L. Huse, Collective communication on dedicated clusters of workstations, " in Recent Advances in Parallel Virtual Machine and Message Passing Interface, ser. Lecture Notes in Computer Science, pp.469-476, 1999.

R. Graham and G. Shipman, MPI support for multi-core architectures: Optimized shared memory collectives, " in Recent Advances in Parallel Virtual Machine and Message Passing Interface, ser. Lecture Notes in Computer Science, pp.130-140, 2008.

G. E. Fagg, G. Bosilca, J. Pje?ivac-grbovi´cgrbovi´c, T. Angskun, and J. Dongarra, Tuned: A flexible high performance collective communication component developed for Open MPI, Proccedings of DAPSYS'06, pp.65-72, 2006.

P. Geoffray, L. Prylli, and B. Tourancheau, BIP-SMP, Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM) , Supercomputing '99, 1999.
DOI : 10.1145/331532.331552

R. Brightwell, K. Pedretti, and T. Hudson, SMARTMAP: Operating system support for efficient data sharing among processes on a multi-core processor, 2008 SC, International Conference for High Performance Computing, Networking, Storage and Analysis, pp.251-2512, 2008.
DOI : 10.1109/SC.2008.5218881

S. Moreaud, B. Goglin, D. Goodell, and R. Namyst, Optimizing MPI communication within large multicore nodes with kernel assistance, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010.
DOI : 10.1109/IPDPSW.2010.5470849
URL : https://hal.archives-ouvertes.fr/inria-00451471

T. Kielmann, R. F. Hofman, H. E. Bal, A. Plaat, and R. A. Bhoedjang, MagPIe: MPI's collective communication operations for clustered wide area systems, Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming, pp.131-140, 1999.

F. Broquedis, J. C. Ortega, S. Moreaud, N. Furmento, B. Goglin et al., hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, 2010.
DOI : 10.1109/PDP.2010.67
URL : https://hal.archives-ouvertes.fr/inria-00429889

J. M. Squyres and A. Lumsdaine, The Component Architecture of Open MPI: Enabling Third-Party Collective Algorithms*, Proceedings, 18th ACM International Conference on Supercomputing, Workshop on Component Models and Systems for Grid Applications, pp.167-185, 2004.
DOI : 10.1007/0-387-23352-0_11

A. Plaat, H. E. Bal, R. F. Hofman, and T. Kielmann, Sensitivity of parallel applications to large differences in bandwidth and latency in two-layer interconnects, Future Generation Computer Systems, vol.17, issue.6, pp.769-782, 2001.
DOI : 10.1016/S0167-739X(00)00103-5