F. Broquedis, J. Clet-ortega, S. Moreaud, N. Furmento, B. Goglin et al., hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, pp.180-186, 2010.
DOI : 10.1109/PDP.2010.67

URL : https://hal.archives-ouvertes.fr/inria-00429889

F. Broquedis, N. Furmento, B. Goglin, P. Wacrenier, and R. Namyst, ForestGOMP: An Efficient OpenMP Environment for NUMA Architectures, International Journal of Parallel Programming, vol.62, issue.5-6, pp.418-439, 2010.
DOI : 10.1007/s10766-010-0136-3

URL : https://hal.archives-ouvertes.fr/inria-00496295

D. Buntinas, B. Goglin, D. Goodell, G. Mercier, and S. Moreaud, Cache-Efficient, Intranode, Large-Message MPI Communication with MPICH2-Nemesis, 2009 International Conference on Parallel Processing, pp.462-469, 2009.
DOI : 10.1109/ICPP.2009.22

URL : https://hal.archives-ouvertes.fr/inria-00390064

H. Casanova, A. Giersch, A. Legrand, M. Quinson, and F. Suter, Versatile, scalable, and accurate simulation of distributed applications and platforms, Journal of Parallel and Distributed Computing, vol.74, issue.10, pp.742899-2917, 2014.
DOI : 10.1016/j.jpdc.2014.06.008

URL : https://hal.archives-ouvertes.fr/hal-01017319

H. Chen, W. Chen, J. Huang, B. Robert, and H. Kuhn, MPIPP, Proceedings of the 20th annual international conference on Supercomputing , ICS '06, pp.353-360, 2006.
DOI : 10.1145/1183401.1183451

E. David, R. M. Culler, D. A. Karp, A. Patterson, K. E. Sahay et al., LogP: Towards a Realistic Model of Parallel Computation, Principles Practice of Parallel Programming, pp.1-12, 1993.

B. Gerofi, M. Takagi, and Y. Ishikawa, Euro-Par 2014: Parallel Processing Workshops: Euro-Par 2014 International Workshops Revised Selected Papers, Part II, chapter Exploiting Hidden Non-uniformity of Uniform Memory Access on Manycore CPUs, pp.242-253, 2014.

B. Goglin, J. Hursey, and J. M. Squyres, Netloc: Towards a Comprehensive View of the HPC System Topology, 2014 43rd International Conference on Parallel Processing Workshops, pp.216-225, 2014.
DOI : 10.1109/ICPPW.2014.38

URL : https://hal.archives-ouvertes.fr/hal-01010599

B. Goglin and S. Moreaud, Dodging Non-uniform I/O Access in Hierarchical Collective Operations for Multicore Clusters, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, pp.788-794, 2011.
DOI : 10.1109/IPDPS.2011.222

URL : https://hal.archives-ouvertes.fr/inria-00566246

J. González-domínguez, G. L. Taboada, B. B. Fraguela, M. J. Martín, and J. Touriño, Automatic mapping of parallel applications on multicore architectures using the Servet benchmark suite, Computers & Electrical Engineering, vol.38, issue.2, pp.258-269, 2012.
DOI : 10.1016/j.compeleceng.2011.12.007

T. Hoefler and M. Snir, Generic topology mapping strategies for large-scale parallel architectures, Proceedings of the international conference on Supercomputing, ICS '11, pp.75-85, 2011.
DOI : 10.1145/1995896.1995909

J. Hursey and J. M. Squyres, Advancing application process affinity experimentation, Proceedings of the 20th European MPI Users' Group Meeting on, EuroMPI '13, pp.163-168, 2013.
DOI : 10.1145/2488551.2488603

E. Jeannot, G. Mercier, and F. Tessier, Process Placement in Multicore Clusters:Algorithmic Issues and Practical Techniques, IEEE Transactions on Parallel and Distributed Systems, vol.25, issue.4, pp.993-1002
DOI : 10.1109/TPDS.2013.104

URL : https://hal.archives-ouvertes.fr/hal-00803548

S. Kim, D. Chandra, and Y. Solihin, Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture, Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT '04), pp.111-122, 2004.

T. Ma, G. Bosilca, A. Bouteiller, and J. J. Dongarra, Locality and Topology Aware Intra-node Communication among Multicore CPUs, Proceedings of the 17th European MPI Users Group Conference, number 6305 in Lecture Notes in Computer Science, pp.265-274, 2010.
DOI : 10.1007/978-3-642-15646-5_28

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.180.4377

S. Moreaud and B. Goglin, Impact of NUMA Effects on High-Speed Networking with Multi-Opteron Machines, Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems, pp.24-29, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00175747

S. Moreaud, B. Goglin, and R. Namyst, Adaptive MPI Multirail Tuning for Non-uniform Input/Output Access, Lecture Notes in Computer Science, vol.6305, pp.239-248, 2010.
DOI : 10.1007/978-3-642-15646-5_25

URL : https://hal.archives-ouvertes.fr/inria-00486178

F. Pellegrini, Scotch and libScotch 5.1 User's Guide, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00410332

B. Putigny, B. Goglin, and D. Barthou, A benchmark-based performance model for memory-bound HPC applications, 2014 International Conference on High Performance Computing & Simulation (HPCS), pp.943-950, 2014.
DOI : 10.1109/HPCSim.2014.6903790

URL : https://hal.archives-ouvertes.fr/hal-00985598

E. R. Rodrigues, F. L. Madruga, P. O. Navaux, and J. Panetta, Multi-core aware process mapping and its impact on communication overhead of parallel applications, 2009 IEEE Symposium on Computers and Communications, pp.811-817, 2009.
DOI : 10.1109/ISCC.2009.5202271

F. Song, S. Moore, and J. Dongarra, Feedback-directed thread scheduling with memory considerations, Proceedings of the 16th international symposium on High performance distributed computing , HPDC '07, pp.97-106, 2007.
DOI : 10.1145/1272366.1272380

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.110.8567

F. Song, S. Moore, and J. Dongarra, Analytical modeling and optimization for affinity based thread scheduling on multicore systems, 2009 IEEE International Conference on Cluster Computing and Workshops, 2009.
DOI : 10.1109/CLUSTR.2009.5289173

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.535.5485

M. Steckermeier and F. Bellosa, Using locality information in userlevel scheduling, 1995.

H. Subramoni, S. Potluri, K. Kandalla, B. Barth, J. Vienne et al., Design of a scalable InfiniBand topology service to enable network-topology-aware placement of processes, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, 2012.
DOI : 10.1109/SC.2012.47

A. Szalay, J. Bunn, I. Gray, I. Foster, and . Raicu, The importance of data locality in distributed computing applications, NSF Workflow Workshop, 2006.

J. Treibig, G. Hager, and G. Wellein, LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments, 2010 39th International Conference on Parallel Processing Workshops, pp.207-216, 2010.
DOI : 10.1109/ICPPW.2010.38

URL : http://arxiv.org/abs/1004.4431