L. Adhianto, S. Banerjee, M. Fagan, M. Krentel, G. Marin et al., HPCTOOLKIT: tools for performance analysis of optimized parallel programs, Concurrency and Computation: Practice and Experience, p.685701, 2010.
DOI : http://doi.acm.org/10.1145/1654059.1654111

Y. Bao, M. Chen, Y. Ruan, L. Liu, J. Fan et al., HMTT: A Platform Independent Full-system Memory Trace Monitoring System, Proceedings of the 2008 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS '08, p.229240, 2008.

R. Bosch, C. Stolte, D. Tang, J. Gerth, M. Rosenblum et al., Rivet, ACM SIGGRAPH Computer Graphics, vol.34, issue.1, p.6873, 2000.
DOI : 10.1145/563788.604455

F. Broquedis, J. Clet-ortega, S. Moreaud, N. Furmento, B. Goglin et al., hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, p.180186, 2010.
DOI : 10.1109/PDP.2010.67

URL : https://hal.archives-ouvertes.fr/inria-00429889

M. Dashti, A. Fedorova, J. Funston, F. Gaud, R. Lachaize et al., Trac Management: A Holistic Approach to Memory Placement on NUMA Systems, Architectural Support for Programming Languages and Operating Systems (ASPLOS), p.381393, 2013.

L. Derose, K. Ekanadham, J. K. Hollingsworth, and S. Sbaraglia, SIGMA: A Simulator Infrastructure to Guide Memory Analysis, ACM/IEEE SC 2002 Conference (SC'02)
DOI : 10.1109/SC.2002.10055

A. Luiz and . Derose, The Hardware Performance Monitor Toolkit, Euro-Par 2001 Parallel Processing, p.122132

M. Diener, E. H. Cruz, and P. O. Navaux, Locality vs. Balance: Exploring Data Mapping Policies on NUMA Systems, 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, p.916, 2015.
DOI : 10.1109/PDP.2015.11

M. Diener, E. H. Cruz, P. O. Navaux, A. Busse, and H. Heiÿ, kMAF, Proceedings of the 23rd international conference on Parallel architectures and compilation, PACT '14, 2014.
DOI : 10.1145/2628071.2628085

U. Drepper, What every programmer should know about memory, 2007.

J. Paul and . Drongowski, Instruction-based sampling: A new performance analysis technique for AMD family 10h processors, 2007.

J. Paul and . Drongowski, An introduction to analysis and optimization with AMD CodeAnalyst— Performance Analyzer, 2008.

F. Dupros, H. Aochi, A. Ducellier, D. Komatitsch, and J. Roman, Exploiting Intensive Multithreading for the Ecient Simulation of 3D Seismic Wave Propagation, IEEE International Conference on Computational Science and Engineering (CSE), p.253260, 2008.

A. Giménez, T. Gamblin, B. Rountree, A. Bhatele, I. Jusu et al., Dissecting On-Node Memory Access Performance: A Semantic Approach, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis, p.166176, 2014.
DOI : 10.1109/SC.2014.19

. Intel, Intel Performance Counter Monitor -A better way to measure CPU utilization, 2012.

T. Jiang, Q. Zhang, R. Hou, L. Chai, S. A. Mckee et al., Understanding the behavior of in-memory computing workloads, 2014 IEEE International Symposium on Workload Characterization (IISWC), p.2230, 2014.
DOI : 10.1109/IISWC.2014.6983036

H. Jin, J. Frumkin, and . Yan, The OpenMP implementation of NAS Parallel Benchmarks and Its Performance, 1999.

R. Lachaize, B. Lepers, and V. Quema, MemProf: A Memory Proler for NUMA Multicore Systems, USENIX 2012 Annual Technical Conference (USENIX ATC 12), pp.5364-2012

X. Liu and J. Mellor-crummey, A Tool to Analyze the Performance of Multithreaded Programs on NUMA Architectures, Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '14, p.259272, 2014.

H. Löf and S. Holmgren, anity-on-next-touch: Increasing the Performance of an Industrial PDE Solver on a cc-NUMA System, International Conference on Supercomputing (SC), p.387392, 2005.

C. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser et al., Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation, Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '05, p.190200, 2005.

Z. Majo and T. R. Gross, (Mis)understanding the NUMA memory system performance of multithreaded workloads, 2013 IEEE International Symposium on Workload Characterization (IISWC), p.1122, 2013.
DOI : 10.1109/IISWC.2013.6704666

M. Marchetti, L. Kontothanassis, R. Bianchini, and M. L. Scott, Using simple page placement policies to reduce the cost of cache lls in coherent shared-memory systems, International Parallel Processing Symposium (IPPS), p.480485, 1995.

M. Martonosi, A. Gupta, and T. Anderson, MemSpy: Analyzing Memory System Bottlenecks in Programs, Proceedings of the 1992 ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS '92/PERFORMANCE '92, p.112, 1992.

C. Mccurdy and J. Vetter, Memphis: Finding and fixing NUMA-related performance problems on multi-core platforms, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS), p.8796, 2010.
DOI : 10.1109/ISPASS.2010.5452060

G. Piccoli, H. N. Santos, R. E. Rodrigues, C. Pousa, E. Borin et al., Compiler support for selective page migration in NUMA architectures, Proceedings of the 23rd international conference on Parallel architectures and compilation, PACT '14, p.369380, 2014.
DOI : 10.1145/2628071.2628077

J. Reinders, VTune performance analyzer essentials, 2005.

P. Christiane, J. Ribeiro, A. Mehaut, M. Carissimi, . Castro et al., Memory Anity for Hierarchical Shared Memory Multiprocessors, International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), p.5966, 2009.

J. Tao, W. Karl, and M. Schulz, Visualizing the Memory Access Behavior of Shared Memory Applications on NUMA Architectures, VassilN
DOI : 10.1007/3-540-45718-6_91

B. Weyers, C. Terboven, D. Schmidl, J. Herber, T. W. Kuhlen et al., Visualization of Memory Access Behavior on Hierarchical NUMA Architectures, 2014 First Workshop on Visual Performance Analysis, p.4249, 2014.
DOI : 10.1109/VPA.2014.12

R. N°-8774 and R. Centre-grenoble-?-rhône-alpes, Inovallée 655 avenue de l'Europe Montbonnot 38334 Saint Ismier Cedex Publisher Inria Domaine de Voluceau -Rocquencourt BP 105 -78153 Le Chesnay Cedex inria, pp.249-6399