HPCTOOLKIT: tools for performance analysis of optimized parallel programs, Concurrency and Computation: Practice and Experience, pp.685-701, 2010. ,
DOI : http://doi.acm.org/10.1145/1654059.1654111
Validity of the single processor approach to achieving large scale computing capabilities, Proceedings of the April 18-20, 1967, spring joint computer conference on, AFIPS '67 (Spring), 1967. ,
DOI : 10.1145/1465482.1465560
The Landscape of Parallel Computing Research: A View from Berkeley, 2006. ,
A case for NUMAaware contention management on multicore systems, PACT, 2010. ,
hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, 2010. ,
DOI : 10.1109/PDP.2010.67
URL : https://hal.archives-ouvertes.fr/inria-00429889
A Scalable Cross-Platform Infrastructure for Application Performance Tuning Using Hardware Counters, ACM/IEEE SC 2000 Conference (SC'00), 2000. ,
DOI : 10.1109/SC.2000.10029
PerfExpert: An Easy-to-Use Performance Diagnosis Tool for HPC Applications, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, 2010. ,
DOI : 10.1109/SC.2010.41
Performance counters on Linux, Linux Plumbers Conference, 2009. ,
High-Performance Embedded Architecture and Compilation Roadmap, chapter 1, LNCS, 2007. ,
The Behavior of Efficient Virtual Machine Interpreters on Modern Architectures, Euro-Par Parallel Processing, 2001. ,
DOI : 10.1007/3-540-44681-8_59
MAO — An extensible micro-architectural optimizer, International Symposium on Code Generation and Optimization (CGO 2011), 2011. ,
DOI : 10.1109/CGO.2011.5764669
IEEE 754-2008, Standard for Floating-Point Arithmetic, 2008. ,
Cache-aware Roofline model: Upgrading the loft, IEEE Computer Architecture Letters, vol.13, issue.1, p.1, 2013. ,
DOI : 10.1109/L-CA.2013.6
Technologies for measuring software performance ,
Intel64 and IA-32 Architectures Optimization Reference Manual, 2011. ,
Accuracy of performance monitoring hardware, Los Alamos Computer Science Institute Symposium, 2002. ,
A comparison of counting and sampling modes of using performance monitoring hardware, ICCS, 2002. ,
We have it easy, but do we have it right?, 2008 IEEE International Symposium on Parallel and Distributed Processing, 2008. ,
DOI : 10.1109/IPDPS.2008.4536408
Applying the roofline model, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2014. ,
DOI : 10.1109/ISPASS.2014.6844463
Tiptop: Hardware Performance Counters for the Masses, 2012 41st International Conference on Parallel Processing Workshops, 2012. ,
DOI : 10.1109/ICPPW.2012.58
URL : https://hal.archives-ouvertes.fr/hal-00639173
The basics of performance-monitoring hardware. Micro, IEEE, vol.22, issue.4, 2002. ,
Roofline, Communications of the ACM, vol.52, issue.4, pp.65-76, 2009. ,
DOI : 10.1145/1498765.1498785