B. Brandfass, T. Alrutz, and T. Gerhold, Rank reordering for MPI communication optimization, Computers & Fluids, vol.80, 2012.
DOI : 10.1016/j.compfluid.2012.01.019

F. Broquedis, J. Clet-ortega, S. Moreaud, N. Furmento, B. Goglin et al., hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, 2010.
DOI : 10.1109/PDP.2010.67
URL : https://hal.archives-ouvertes.fr/inria-00429889

D. Culler, R. Karp, D. Patterson, A. Sahay, K. E. Schauser et al., LogP: Towards a Realistic Model of Parallel Computation. SIGPLAN Not, pp.1-12, 1993.

B. Goglin, J. Hursey, and J. M. Squyres, Netloc: Towards a Comprehensive View of the HPC System Topology, 2014 43rd International Conference on Parallel Processing Workshops, pp.216-225, 2014.
DOI : 10.1109/ICPPW.2014.38
URL : https://hal.archives-ouvertes.fr/hal-01010599

T. Hatazaki, Rank reordering strategy for MPI topology creation functions, Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp.188-195, 1998.
DOI : 10.1007/BFb0056575

R. W. Hockney, The communication challenge for MPP: Intel Paragon and Meiko CS-2, Parallel Computing, vol.20, issue.3, pp.3-389, 1994.
DOI : 10.1016/S0167-8191(06)80021-9

J. Hursey, J. M. Squyres, and T. Dontje, Locality-Aware Parallel Process Mapping for Multi-core HPC Systems, 2011 IEEE International Conference on Cluster Computing, pp.527-531, 2011.
DOI : 10.1109/CLUSTER.2011.59

J. L. Träff, Implementing the MPI Process Topology Mechanism, ACM/IEEE SC 2002 Conference (SC'02), pp.1-14, 2002.
DOI : 10.1109/SC.2002.10045

E. Jeannot, G. Mercier, and F. Tessier, Process Placement in Multicore Clusters: Algorithmic Issues and Practical Techniques, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00921605

T. Jesper-larsson, Implementing the MPI process topology mechanism, Supercomputing '02: Proceedings of the 2002 ACM/IEEE conference on Supercomputing, pp.1-14, 2002.

T. Kielmann, H. E. Bal, and K. Verstoep, Fast Measurement of LogP Parameters for Message Passing Platforms, pp.1176-1183, 2000.
DOI : 10.1007/3-540-45591-4_162

A. Kleen, A NUMA API for Linux, Novel Inc, 2005.

G. Mercier and J. Clet-ortega, Towards an Efficient Process Placement Policy for MPI Applications in Multicore Environments, EuroPVM/MPI, pp.104-115, 2009.
DOI : 10.1109/PDP.2009.43
URL : https://hal.archives-ouvertes.fr/inria-00392581

G. Mercier and E. Jeannot, Improving MPI Applications Performance on Multicore Clusters with Rank Reordering, In EuroMPI (Lecture Notes in Computer Science), vol.49, pp.39-49, 2011.
DOI : 10.1145/1183401.1183451
URL : https://hal.archives-ouvertes.fr/hal-00643151

. Plafrim, Plate-forme Fédérative pour la Recherche en Informatique et Mathématiques

J. Quintin, K. Hasanov, and A. Lastovetsky, Hierarchical Parallel Matrix Multiplication on Large-Scale Distributed Memory Platforms, 2013 42nd International Conference on Parallel Processing, pp.754-76289, 2013.
DOI : 10.1109/ICPP.2013.89

M. J. Rashti, J. Green, P. Balaji, A. Afsahi, and W. Gropp, Multi-core and Network Aware MPI Topology Functions, EuroMPI 2011. Recent Advances in the Message Passing Interface -18th European MPI Users' Group Meeting, pp.50-60, 2011.
DOI : 10.1109/PDP.2010.67

J. Reinders and J. Jeffers, High Performance Parallelism Pearls, 2015.

R. A. Van-de-geijn and J. Watts, SUMMA: scalable universal matrix multiplication algorithm. Concurrency: Practice and Experience 9, 4<255::AID-CPE250>3.0. CO, pp.255-2741096, 1997.

J. Zhang, J. Zhai, W. Chen, and W. Zheng, Process Mapping for MPI Collective Communications, Lecture Notes in Computer Science), vol.8, issue.11, pp.81-92, 2009.
DOI : 10.1109/ICPP.2005.62

H. Zhu, D. Goodell, W. Gropp, and R. Thakur, Hierarchical Collectives in MPICH2, Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp.325-326, 2009.
DOI : 10.1109/JSSC.2007.910957