D. E. Bernholdt, S. Boehm, G. Bosilca, M. G. Venkata, R. E. Grant et al., A survey of mpi usage in the u. s. exascale computing project, EXAMPI workshop (in cunjunction with supercomuting, 2017.

F. Broquedis, J. Clet-ortega, S. Moreaud, N. Furmento, B. Goglin et al., Hwloc: a Generic Framework for Managing Hardware Affinities in HPC Applications, Proceedings of the 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP2010), 2010.
DOI : 10.1109/pdp.2010.67

URL : https://hal.archives-ouvertes.fr/inria-00429889

, Message Passing Interface Forum, MPI: A Message-Passing Interface Standard, Version 3.0, Tech. rep, 2012.

A. Kleen and A. Numa-api-for-linux,

B. Nichols, D. Buttlar, and J. P. Farrell, Pthreads Programming, 1996.

J. Reinders and J. Jeffers, High Performance Parallelism Pearls, vol.2, 2015.

J. L. Träff, Implementing the MPI Process Topology Mechanism, in: Supercomputing '02: Proceedings of the 2002 ACM/IEEE conference on Supercomputing, pp.1-14, 2002.

G. Mercier and E. Jeannot, Improving MPI Applications Performance on Multicore Clusters with Rank Reordering, Lecture Notes in Computer Science, vol.6960, pp.39-49, 2011.
DOI : 10.1007/978-3-642-24449-0_7

URL : https://hal.archives-ouvertes.fr/hal-00643151

M. J. Rashti, J. Green, P. Balaji, A. Afsahi, and W. Gropp, EuroMPI 2011. Recent Advances in the Message Passing Interface-18th European MPI Users' Group Meeting, vol.6960, pp.50-60, 2011.

E. Jeannot, G. Mercier, and F. Tessier, Process Placement in Multicore Clusters: Algorithmic Issues and Practical Techniques, IEEE Trans. Parallel Distrib. Syst, vol.25, issue.4, pp.993-1002, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00803548

T. Hatazaki, Recent Advances in Parallel Virtual Machine and Message Passing Interface, vol.1497, pp.188-195, 1998.

T. Jesper-larsson, Implementing the MPI process topology mechanism, Supercomputing '02: Proceedings of the 2002 ACM/IEEE conference on Supercomputing, pp.1-14, 2002.

J. Hursey, J. M. Squyres, and T. Dontje, Locality-Aware Parallel Process Mapping for Multi-core HPC Systems, pp.527-531, 2011.

G. Mercier and J. Clet-ortega, Towards an Efficient Process Placement Policy for MPI Applications in Multicore Environments, EuroPVM/MPI, vol.5759, pp.104-115, 2009.
URL : https://hal.archives-ouvertes.fr/inria-00392581

B. Brandfass, T. Alrutz, and T. Gerhold, Rank Reordering for MPI Communication Optimization

J. Zhang, J. Zhai, W. Chen, and W. Zheng, Process Mapping for MPI Collective Communications, vol.5704, pp.81-92, 2009.

B. Goglin, J. Hursey, and J. M. Squyres, Netloc: Towards a comprehensive view of the HPC system topology, in: 43rd International Conference on Parallel Processing Workshops, pp.216-225, 2014.

R. W. Hockney, The communication challenge for mpp: Intel paragon and meiko cs-2, Parallel Comput, vol.20, issue.3, pp.80021-80030, 1994.

D. Culler, R. Karp, D. Patterson, A. Sahay, K. E. Schauser et al., Logp: Towards a realistic model of parallel computation, SIGPLAN Not, vol.28, issue.7, 1993.

T. Kielmann, H. E. Bal, and K. Verstoep, Fast Measurement of LogP Parameters for Message Passing Platforms, pp.1176-1183, 2000.

. Plafrim, Plate-forme fédérative pour la recherche en informatique et mathématiques

H. Zhu, D. Goodell, W. Gropp, and R. Thakur, Hierarchical Collectives in MPICH2, Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp.325-326, 2009.

J. Quintin, K. Hasanov, and A. Lastovetsky, Hierarchical parallel matrix multiplication on largescale distributed memory platforms, 42nd International Conference on Parallel Processing, pp.754-762, 2013.

R. A. Van-de-geijn and J. Watts, Summa: scalable universal matrix multiplication algorithm, Concurrency: Practice and Experience, vol.9, issue.4, pp.255-274, 1997.