F. Broquedis, J. Clet-ortega, S. Moreaud, N. Furmento, B. Goglin et al., hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, pp.180-186, 2010.
DOI : 10.1109/PDP.2010.67

URL : https://hal.archives-ouvertes.fr/inria-00429889

E. Jeannot, G. Mercier, and F. Tessier, Process Placement in Multicore Clusters:Algorithmic Issues and Practical Techniques, IEEE Transactions on Parallel and Distributed Systems, vol.25, issue.4, pp.993-1002
DOI : 10.1109/TPDS.2013.104

URL : https://hal.archives-ouvertes.fr/hal-00803548

R. Yang, J. Antony, P. P. Janes, and A. P. , Memory and Thread Placement Effects as a Function of Cache Usage: A Study of the Gaussian Chemistry Code on the SunFire X4600 M2, 2008 International Symposium on Parallel Architectures, Algorithms, and Networks (i-span 2008), pp.31-36, 2008.
DOI : 10.1109/I-SPAN.2008.13

P. Micikevicius, Multi-GPU Programming, GPU Technology Conference, 2012.

C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier, StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures Concurrency and Computation: Practice and Experience, Special Issue: Euro-Par, pp.187-198, 2009.
DOI : 10.1007/978-3-642-03869-3_80

URL : http://hal.inria.fr/docs/00/38/43/63/PDF/AugThiNamWac09Europar.pdf

B. Goglin, On the Overhead of Topology Discovery for Locality-Aware Scheduling in HPC, 2017 25th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP), 2017.
DOI : 10.1109/PDP.2017.35

URL : https://hal.archives-ouvertes.fr/hal-01402755

S. Hemmert, J. Ang, B. Carnes, P. Chiang, D. Doerfler et al., Exascale Hardware Architectures Working Group, NNSA Workshop: From Petascale to Exascale: R&D Challenges for HPC Simulation Environments, 2011.
DOI : 10.2172/1022133

URL : https://digital.library.unt.edu/ark:/67531/metadc837022/m2/1/high_res_d/1022133.pdf

. Linux-weekly and . News, Address space randomization in 2.6, 2005.

R. H. Castain, D. Solt, J. Hursey, and A. Bouteiller, PMIx, Proceedings of the 24th European MPI Users' Group Meeting on , EuroMPI '17, pp.1-14, 2017.
DOI : 10.1145/2726935.2726944

B. Goglin, Managing the topology of heterogeneous cluster nodes with hardware locality (hwloc), 2014 International Conference on High Performance Computing & Simulation (HPCS), pp.74-81, 2014.
DOI : 10.1109/HPCSim.2014.6903671

URL : https://hal.archives-ouvertes.fr/hal-00985096

R. Rabenseifner, G. Hager, and G. Jost, Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes, 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing, pp.427-436, 2009.
DOI : 10.1109/PDP.2009.43

L. V. Kalé, The virtualization model of parallel programming : Runtime optimizations and the state of art, LACSI 2002, 2002.

A. Friedley, T. Hoefler, M. Leininger, and A. Lumsdaine, Scalable High Performance Message Passing over InfiniBand for Open MPI, Proceedings of 3rd KiCC Workshop 2007. RWTH Aachen, 2007.

S. Sur, M. J. Koop, and D. K. Panda, MPI and communication---High-performance and scalable MPI over InfiniBand with reduced memory usage, Proceedings of the 2006 ACM/IEEE conference on Supercomputing , SC '06, 2006.
DOI : 10.1145/1188455.1188565

M. J. Koop, J. K. Sridhar, and D. K. Panda, Scalable MPI design over InfiniBand using eXtended Reliable Connection, 2008 IEEE International Conference on Cluster Computing, pp.203-212, 2008.
DOI : 10.1109/CLUSTR.2008.4663773

URL : http://nowlab.cse.ohio-state.edu/publications/conf-papers/2008/koop-cluster08.pdf

M. Pérache, P. Carribault, and H. Jourdren, MPC-MPI: An MPI Implementation Reducing the Overall Memory Consumption, PVM/MPI, vol.9, pp.94-103, 2009.
DOI : 10.1007/3-540-27039-6_19

B. Gerofi, A. Shimada, A. Hori, and Y. Ishikawa, Partially Separated Page Tables for Efficient Operating System Assisted Hierarchical Memory Management on Heterogeneous Architectures, 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, pp.360-368, 2013.
DOI : 10.1109/CCGrid.2013.59

Y. Guo, C. J. Archer, M. Blocksome, S. Parker, W. Bland et al., Memory Compression Techniques for Network Address Management in MPI, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp.1008-1017, 2017.
DOI : 10.1109/IPDPS.2017.18

G. Antoniu, L. Bougé, and R. Namyst, An Efficient and Transparent Thread Migration Scheme in the PM2 Runtime System Available: https, Proceedings of the 11 IPPS/SPDP'99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing, pp.496-510, 1999.

C. Huang, O. Lawlor, L. V. Kalé, M. Adaptive, and . Berlin, Available: https://doi, pp.306-322, 2004.

M. Pérache, H. Jourdren, and R. Namyst, MPC: A Unified Parallel Runtime for Clusters of NUMA Machines, Proceedings of the 14th International Euro-Par Conference on Parallel Processing, ser. Euro-Par '08, pp.78-88, 2008.
DOI : 10.1007/978-3-540-85451-7_9