I. Corp, An Introduction to the Intel QuicPath Interconnect, 2009.

E. Gabriel, G. E. Fagg, G. Bosilca, T. Angskun, J. J. Dongarra et al., Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation, Proceedings, 11th European PVM/MPI Users' Group Meeting, pp.97-104, 2004.
DOI : 10.1007/978-3-540-30218-6_19

R. L. Graham and G. Shipman, MPI Support for Multi-core Architectures: Optimized Shared Memory Collectives, Proceedings of the 15th European PVM/MPI Users' Group Meeting, Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp.130-140, 2008.
DOI : 10.1007/978-3-540-87475-1_21

H. Jang and H. Jin, MiAMI: Multi-core Aware Processor Affinity for TCP/IP over Multiple Network Interfaces, 2009 17th IEEE Symposium on High Performance Interconnects, pp.73-82, 2009.
DOI : 10.1109/HOTI.2009.19

E. Jeannot and G. Mercier, Near-Optimal Placement of MPI Processes on Hierarchical NUMA Architectures, Proceedings of the 16th International Euro-Par Conference, 2010.
DOI : 10.1007/978-3-642-15291-7_20

URL : https://hal.archives-ouvertes.fr/inria-00544346

K. Kandalla, H. Subramoni, G. Santhanaraman, M. Koop, and D. K. Panda, Designing multi-leader-based Allgather algorithms for multi-core clusters, 2009 IEEE International Symposium on Parallel & Distributed Processing, 2009.
DOI : 10.1109/IPDPS.2009.5160896

C. N. Keltcher, K. J. Mcgrath, A. Ahmed, and P. Conway, The AMD opteron processor for multiprocessor servers, IEEE Micro, vol.23, issue.2, pp.66-76, 2003.
DOI : 10.1109/MM.2003.1196116

R. Kumar, A. Mamidala, and D. K. Panda, Scaling alltoall collective on multi-core systems, 2008 IEEE International Symposium on Parallel and Distributed Processing, 2008.
DOI : 10.1109/IPDPS.2008.4536141

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

A. R. Mamidala, R. Kumar, D. De, and D. K. Panda, MPI Collectives on Modern Multicore Clusters: Performance Optimizations and Communication Characteristics, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID), 2008.
DOI : 10.1109/CCGRID.2008.87

S. Moreaud and B. Goglin, Impact of NUMA Effects on High-Speed Networking with Multi- Opteron Machines, The 19th IASTED International Conference on Parallel and Distributed Computing and Systems, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00175747

S. Moreaud, B. Goglin, and R. Namyst, Adaptive MPI Multirail Tuning for Non-uniform Input/Output Access, Lecture Notes in Computer Science, vol.6305, pp.239-248, 2010.
DOI : 10.1007/978-3-642-15646-5_25

URL : https://hal.archives-ouvertes.fr/inria-00486178

J. M. Squyres and A. Lumsdaine, The Component Architecture of Open MPI: Enabling Third-Party Collective Algorithms*, Proceedings, 18th ACM International Conference on Supercomputing, Workshop on Component Models and Systems for Grid Applications, pp.167-185, 2004.
DOI : 10.1007/0-387-23352-0_11

R. Thakur and W. Gropp, Improving the Performance of Collective Operations in MPICH, Proceedings of the 10th European PVM/MPI Users' Group Meeting Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp.257-267, 2003.
DOI : 10.1007/978-3-540-39924-7_38

R. Yang, J. Antony, P. P. Janes, and A. P. , Memory and Thread Placement Effects as a Function of Cache Usage: A Study of the Gaussian Chemistry Code on the SunFire X4600 M2, 2008 International Symposium on Parallel Architectures, Algorithms, and Networks (i-span 2008), pp.31-36, 2008.
DOI : 10.1109/I-SPAN.2008.13