L. Adhianto, S. Banerjee, M. Fagan, M. Krentel, G. Marin et al., HPCTOOLKIT: tools for performance analysis of optimized parallel programs, Concurrency and Computation: Practice and Experience, p.685701, 2010.
DOI : http://doi.acm.org/10.1145/1654059.1654111

X. Liu and J. Mellor-crummey, A Tool to Analyze the Performance of Multithreaded Programs on NUMA Architectures, Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ser. PPoPP '14, p.259272, 2014.

R. Lachaize, B. Lepers, and V. Quema, MemProf: A Memory Proler for NUMA Multicore Systems Available: https, USENIX 2012 Annual Technical Conference (USENIX ATC 12, p.5364

. Rep, Available: https, 2015.

. Intel, Intel Performance Counter Monitor -A better way to measure CPU utilization, 2012.

P. J. Drongowski, An introduction to analysis and optimization with AMD CodeAnalyst— Performance Analyzer, Tech. Rep, 2008.

V. M. Weaver, D. Terpstra, H. Mccraw, M. Johnson, K. Kasichayanula et al., PAPI 5: Measuring power, energy , and the cloud, in Performance Analysis of Systems and Software (ISPASS) LIKWID: A lightweight performanceoriented tool suite for x86 multicore environments, 2013 IEEE International Symposium on Proceedings of PSTI2010, the First International Workshop on Parallel Software Tools and Tool InfrastructuresMis)understanding the NUMA memory system performance of multithreaded workloads Workload Characterization (IISWC), 2013 IEEE International Symposium on, p.1122, 2010.

Z. Mckee, N. Jia, and . Sun, Understanding the behavior of in-memory computing workloads, Workload Characterization (IISWC), 2014 IEEE International Symposium on, p.2230, 2014.

R. Bosch, C. Stolte, D. Tang, J. Gerth, M. Rosenblum et al., Rivet, ACM SIGGRAPH Computer Graphics, vol.34, issue.1, p.68
DOI : 10.1145/563788.604455

B. Weyers, C. Terboven, D. Schmidl, J. Herber, T. W. Kuhlen et al., Visualization of Memory Access Behavior on Hierarchical NUMA Architectures, 2014 First Workshop on Visual Performance Analysis, p.4249, 2014.
DOI : 10.1109/VPA.2014.12

J. Tao, W. Karl, and M. Schulz, Visualizing the Memory Access Behavior of Shared Memory Applications on NUMA Architectures, Computational Science -ICCS 2001
DOI : 10.1007/3-540-45718-6_91

C. Mccurdy and J. Vetter, Memphis: Finding and fixing NUMA-related performance problems on multi-core platforms, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS), p.8796, 2010.
DOI : 10.1109/ISPASS.2010.5452060

B. Bremer and . Hamann, Dissecting On-Node Memory Access Performance: A Semantic Approach Networking , Storage and Analysis, ser. SC '14, Proceedings of the International Conference for High Performance Computing, p.166176, 2014.

P. J. Drongowski, Instruction-based sampling: A new performance analysis technique for AMD family 10h processors, AMD CodeAnalyst Project, 21] D. Levinthal, Performance Analysis Guide for Intel® Core— i7 Processor and Intel® Xeon— 5500 processors, 2007.

H. and S. Gelabert, Towards instantaneous performance analysis using coarsegrain sampled and instrumented data

Y. Bao, M. Chen, Y. Ruan, L. Liu, J. Fan et al., HMTT: A Platform Independent Fullsystem Memory Trace Monitoring System, Proceedings of the 2008 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, ser. SIGMETRICS '08, p.229240, 2008.

M. Martonosi, A. Gupta, and T. Anderson, MemSpy: Analyzing Memory System Bottlenecks in Programs, Proceedings of the 1992 ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems
DOI : 10.1145/149439.133079

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

D. Beniamine, M. Diener, G. Huard, and P. O. Navaux, TABARNAC, Proceedings of the 2nd Workshop on Visual Performance Analysis, VPA '15, p.11, 2015.
DOI : 10.1145/2835238.2835239

URL : https://hal.archives-ouvertes.fr/hal-01221146

H. Boehm, A. J. Demers, and S. Shenker, Mostly Parallel Garbage Collection, Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation , ser. PLDI '91, p.157164, 1991.
DOI : 10.1145/113446.113459

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

J. Heo, S. Yi, Y. Cho, J. Hong, and S. Y. Shin, Space-ecient Page-level Incremental Checkpointing, Proceedings of the 2005 ACM Symposium on Applied Computing, ser. SAC '05, p.15581562, 2005.
DOI : 10.1145/1066677.1067026

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

S. T. Jones, A. C. Arpaci-dusseau, R. H. Arpaci-dusseau, and . Geiger, Monitoring the Buer Cache in a Virtual Machine Environment, SIGARCH Comput . Archit. News, vol.34, issue.5, p.1424, 2006.

C. S. Bae, L. Xia, P. Dinda, and J. Lange, Dynamic adaptive virtual core mapping to improve power, energy, and performance in multi-socket multicores, Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing, HPDC '12, p.247258, 2012.
DOI : 10.1145/2287076.2287114

M. Diener, E. H. Cruz, and P. O. Navaux, Communication-Based Mapping Using Shared Pages, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, p.700711, 2013.
DOI : 10.1109/IPDPS.2013.57

V. J. Reddi and K. Hazelwood, PIN, Proceedings of the 2004 workshop on Computer architecture education held in conjunction with the 31st International Symposium on Computer Architecture, WCAE '04, p.190200, 2005.
DOI : 10.1145/1275571.1275600

H. Jin, M. Frumkin, and J. Yan, The OpenMP implementation of NAS Parallel Benchmarks and Its Performance, NASA, Tech. Rep, Inria RESEARCH CENTRE GRENOBLE ? RHÔNE-ALPES Inovallée 655 avenue de l'Europe Montbonnot 38334 Saint Ismier Cedex Publisher Inria Domaine de Voluceau -Rocquencourt BP 105 -78153 Le Chesnay Cedex inria.fr ISSN, pp.249-6399, 1999.