L. Adhianto, S. Banerjee, M. Fagan, M. Krentel, G. Marin et al., HPCTOOLKIT: tools for performance analysis of optimized parallel programs, Concurrency and Computation: Practice and Experience, pp.685-701, 2010.
DOI : http://doi.acm.org/10.1145/1654059.1654111

G. M. Amdahl, Validity of the single processor approach to achieving large scale computing capabilities, Proceedings of the April 18-20, 1967, spring joint computer conference on, AFIPS '67 (Spring), 1967.
DOI : 10.1145/1465482.1465560

K. Asanovi´casanovi´c, R. Bodik, B. Catanzaro, J. Gebis, P. Husbands et al., The Landscape of Parallel Computing Research: A View from Berkeley, 2006.

S. Blagodurov, S. Zhuravlev, A. Fedorova, and A. Kamali, A case for NUMAaware contention management on multicore systems, PACT, 2010.

F. Broquedis, J. C. Ortega, S. Moreaud, N. Furmento, B. Goglin et al., hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, 2010.
DOI : 10.1109/PDP.2010.67

URL : https://hal.archives-ouvertes.fr/inria-00429889

S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci, A Scalable Cross-Platform Infrastructure for Application Performance Tuning Using Hardware Counters, ACM/IEEE SC 2000 Conference (SC'00), 2000.
DOI : 10.1109/SC.2000.10029

M. Burtscher, B. Kim, J. Diamond, J. Mccalpin, L. Koesterke et al., PerfExpert: An Easy-to-Use Performance Diagnosis Tool for HPC Applications, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, 2010.
DOI : 10.1109/SC.2010.41

A. Carvalho-de-melo, Performance counters on Linux, Linux Plumbers Conference, 2009.

. Koen-de-bosschere, High-Performance Embedded Architecture and Compilation Roadmap, chapter 1, LNCS, 2007.

M. Ertl and D. Gregg, The Behavior of Efficient Virtual Machine Interpreters on Modern Architectures, Euro-Par Parallel Processing, 2001.
DOI : 10.1007/3-540-44681-8_59

R. Hundt, E. Raman, M. Thuresson, and N. Vachharajani, MAO — An extensible micro-architectural optimizer, International Symposium on Code Generation and Optimization (CGO 2011), 2011.
DOI : 10.1109/CGO.2011.5764669

I. Task and P. , IEEE 754-2008, Standard for Floating-Point Arithmetic, 2008.

A. Ilic, F. Pratas, and L. Sousa, Cache-aware Roofline model: Upgrading the loft, IEEE Computer Architecture Letters, vol.13, issue.1, p.1, 2013.
DOI : 10.1109/L-CA.2013.6

. Intel, Technologies for measuring software performance

. Intel, Intel64 and IA-32 Architectures Optimization Reference Manual, 2011.

E. Michael, P. J. Maxwell, L. Teller, S. Salayandia, and . Moore, Accuracy of performance monitoring hardware, Los Alamos Computer Science Institute Symposium, 2002.

V. Shirley and . Moore, A comparison of counting and sampling modes of using performance monitoring hardware, ICCS, 2002.

T. Mytkowicz, A. Diwan, M. Hauswirth, and P. Sweeney, We have it easy, but do we have it right?, 2008 IEEE International Symposium on Parallel and Distributed Processing, 2008.
DOI : 10.1109/IPDPS.2008.4536408

G. Ofenbeck, R. Steinmann, V. Caparros, G. Daniele, M. Spampinato et al., Applying the roofline model, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2014.
DOI : 10.1109/ISPASS.2014.6844463

E. Rohou, Tiptop: Hardware Performance Counters for the Masses, 2012 41st International Conference on Parallel Processing Workshops, 2012.
DOI : 10.1109/ICPPW.2012.58

URL : https://hal.archives-ouvertes.fr/hal-00639173

B. Sprunt, The basics of performance-monitoring hardware. Micro, IEEE, vol.22, issue.4, 2002.

S. Williams, A. Waterman, and D. Patterson, Roofline, Communications of the ACM, vol.52, issue.4, pp.65-76, 2009.
DOI : 10.1145/1498765.1498785