D. Barthou, C. Rubial, A. Jalby, W. Koliai, S. Valensi et al., Performance Tuning of x86 OpenMP Codes with MAQAO, Tools for High Performance Computing, p.95113, 2009.
DOI : 10.1007/978-3-642-11261-4_7

F. Broquedis, J. Clet-ortega, S. Moreaud, N. Furmento, B. Goglin et al., hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, 2010.
DOI : 10.1109/PDP.2010.67

URL : https://hal.archives-ouvertes.fr/inria-00429889

S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci, A scalable crossplatform infrastructure for application performance tuning using hardware counters, Proceedings of the 2000 ACM/IEEE Conference on Supercomputing. SC '00, 2000.

M. Geimer, F. Wolf, B. J. Wylie, E. Ábrahám, D. Becker et al., The Scalasca performance toolset architecture, Proc. of the International Workshop on Scalable Tools for High-End Computing (STHEC), p.5165, 2008.
DOI : 10.1002/cpe.1556

A. Gimenez, T. Gamblin, B. Rountree, A. Bhatele, I. Jusu et al., Dissecting On-Node Memory Access Performance: A Semantic Approach, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis, p.166176, 2014.
DOI : 10.1109/SC.2014.19

J. Hursey, J. M. Squyres, and T. Dontje, Locality-Aware Parallel Process Mapping for Multi-core HPC Systems, 2011 IEEE International Conference on Cluster Computing, p.527531, 2011.
DOI : 10.1109/CLUSTER.2011.59

E. Jeannot, G. Mercier, and F. Tessier, Process placement in multicore clusters: Algorithmic issues and practical techniques. Parallel and Distributed Systems, IEEE Transactions on, vol.25, issue.4, p.9931002, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00803548

A. Knüpfer, H. Brunst, J. Doleschal, M. Jurenz, M. Lieber et al., The Vampir Performance Analysis Tool-Set, Proceedings of the 2nd International Workshop on Parallel Tools for High Performance Computing, p.139155, 2008.
DOI : 10.1007/978-3-540-68564-7_9

C. K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser et al., Pin: Building customized program analysis tools with dynamic instrumentation, Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation. pp. 190200. PLDI '05, 2005.

Z. Majo and T. R. Gross, Memory management in numa multicore systems: Trapped between cache contention and interconnect overhead, SIGPLAN Not, vol.46, issue.11, p.1120, 2011.

V. Pillet, J. Labarta, T. Cortes, and S. Girona, PARAVER: A Tool to Visualize and Analyze Parallel Code, Proceedings of WoTUG-18: Transputer and occam Developments, p.1731, 1995.

E. Rohou, Tiptop: Hardware Performance Counters for the Masses, 2012 41st International Conference on Parallel Processing Workshops, p.7789, 2011.
DOI : 10.1109/ICPPW.2012.58

URL : https://hal.archives-ouvertes.fr/hal-00639173

J. Treibig, G. Hager, and G. Wellein, LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments, 2010 39th International Conference on Parallel Processing Workshops, p.207216, 2010.
DOI : 10.1109/ICPPW.2010.38

S. Zhuravlev, S. Blagodurov, and A. Fedorova, Addressing shared resource contention in multicore processors via scheduling, SIGPLAN Not, vol.45, issue.3, p.129142, 2010.