L. Adhianto, S. Banerjee, M. Fagan, M. Krentel, G. Marin et al., HPCTOOLKIT: tools for performance analysis of optimized parallel programs, Concurrency and Computation: Practice and Experience, p.685701, 2010.
DOI : http://doi.acm.org/10.1145/1654059.1654111

D. Beniamine, Cartographier la mémoire virtuelle d'une application de calcul scientique, ComPAS'2013 / RenPar'21, 2013.

R. Andrew, B. P. Bernat, and . Miller, Anywhere, Any-time Binary Instrumentation, Proceedings of the 10th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools, p.916, 2011.

K. Beyls and E. D. Hollander, Reuse Distance as a Metric for Cache Behavior, Proceedings of the IASTED Conference on Parallel and Distributed Computing and systems, p.617622, 2001.

H. M. Eduardo, M. Cruz, M. A. Diener, P. O. Alves, and . Navaux, Dynamic thread mapping of shared memory applications by exploiting cache coherence protocols, Journal of Parallel and Distributed Computing, issue.3, p.7422152228, 2014.

M. Diener, E. H. Cruz, and P. O. Navaux, Using the Translation Lookaside Buer to Map Threads in Parallel Applications Based on Shared Memory, Parallel Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International, p.532543, 2012.

M. Diener, E. H. Cruz, and P. O. Navaux, Communication-Based Mapping Using Shared Pages, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, p.700711, 2013.
DOI : 10.1109/IPDPS.2013.57

M. Diener, F. L. Madruga, E. R. Rodrigues, M. A. Alves, J. Schneider et al., Evaluating Thread Placement Based on Memory Access Patterns for Multicore Processors, High Performance Computing and Communications (HPCC) 12th IEEE International Conference on, p.491496, 2010.

D. Dosimont, L. Mello-schnorr, G. Huard, and J. Vincent, A Trace Macroscopic Description based on Time Aggregation Trace visualization; trace analysis; trace overview; time aggregation; parallel systems; embedded systems; information theory; scientic computation; multimedia application ; debugging; optimization, 2014.

U. Drepper, What every programmer should know about memory. Red Hat, 2007.

J. Paul and . Drongowski, Instruction-based sampling: A new performance analysis technique for AMD family 10h processors, 2007.

R. Lachaize, B. Lepers, and V. Quema, MemProf: A Memory Proler for NUMA Multicore Systems, USENIX 2012 Annual Technical Conference (USENIX ATC 12), p.5364

G. Pagano, D. Dosimont, G. Huard, V. Marangozova-martin, J. M. Vincent et al., Trace Management and Analysis for Embedded Systems PARAVER: A Tool to Visualize and Analyze Parallel Code, Embedded Multicore Socs (MCSoC), 2013 IEEE 7th International Symposium on Proceedings of WoTUG-18: Transputer and occam Developments, pp.119122-1731, 1995.

J. Reinders, VTune performance analyzer essentials, 2005.

C. Ruiz, S. Harrache, M. Mercier, and O. Richard, Reconstructable Software Appliances with Kameleon, ACM SIGOPS Operating Systems Review, vol.49, issue.1, p.8089, 2015.
DOI : 10.1145/2723872.2723883

URL : https://hal.archives-ouvertes.fr/hal-01334135

J. Treibig, G. Hager, and G. Wellein, LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments, 2010 39th International Conference on Parallel Processing Workshops, 2010.
DOI : 10.1109/ICPPW.2010.38

V. M. Weaver, D. Terpstra, H. Mccraw, M. Johnson, K. Kasichayanula et al., PAPI 5: Measuring power, energy, and the cloud, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp.124125-0249, 2013.
DOI : 10.1109/ISPASS.2013.6557155