D. Molka, D. Hackenberg, R. Schone, and W. Nagel, Cache Coherence Protocol and Memory Performance of the Intel Haswell-EP Architecture, 2015 44th International Conference on Parallel Processing, pp.739-748, 2015.
DOI : 10.1109/ICPP.2015.83

S. Blagodurov, S. Zhuravlev, M. Dashti, and A. Fedorova, A case for NUMA-aware contention management on multicore systems, Proceedings of the 19th international conference on Parallel architectures and compilation techniques, PACT '10, pp.1-1, 2011.
DOI : 10.1145/1854273.1854350

B. Lepers, V. Quema, and A. Fedorova, Thread and memory placement on numa systems: Asymmetry matters, 2015 USENIX Annual Technical Conference (USENIX ATC 15, pp.277-289, 2015.

P. J. Drongowski, Instruction-based sampling: A new performance analysis technique for amd family 10h processors, 2007.

J. Dongarra, K. London, S. Moore, P. Mucci, and D. Terpstra, Using papi for hardware performance monitoring on linux systems, Conference on Linux Clusters: The HPC Revolution, Linux Clusters Institute, 2001.

I. Lopez, S. Moore, and V. Weaver, A prototype sampling interface for papi Scientific Advancements Enabled by Enhanced Cyberinfrastructure, ser. XSEDE '15, Proceedings of the 2015 XSEDE Conference, pp.271-298, 2015.
DOI : 10.1145/2792745.2792772

C. Mccurdy and J. Vetter, Memphis: Finding and fixing NUMA-related performance problems on multi-core platforms, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS), pp.87-96, 2010.
DOI : 10.1109/ISPASS.2010.5452060

R. Lachaize, B. Lepers, and V. Quéma, Memprof: A memory profiler for numa multicore systems, Proceedings of the 2012 USENIX Conference on Annual Technical Conference, ser. USENIX ATC'12, pp.5-5, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00945731

M. Dashti, A. Fedorova, J. Funston, F. Gaud, R. Lachaize et al., Traffic management: A holistic approach to memory placement on numa systems, Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS '13, pp.381-394, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00945758

X. Liu and J. Mellor-crummey, A tool to analyze the performance of multithreaded programs on numa architectures, Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ser. PPoPP '14, pp.259-272, 2014.

X. Liu and B. Wu, ScaAnalyzer, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '15, pp.1-47, 2015.
DOI : 10.1145/1952682.1952688

M. Selva, Performance Monitoring of Throughput Constrained Dataflow Programs Executed On Shared-Memory Multi-core Architectures, Theses, Institut National des Sciences Appliquées de Lyon, 2015.
URL : https://hal.archives-ouvertes.fr/tel-01264258

H. Yviquel, A. Lorence, K. Jerbi, G. Cocherel, A. Sanchez et al., Orcc, Proceedings of the 21st ACM international conference on Multimedia, MM '13, pp.863-866, 2013.
DOI : 10.1145/2502081.2502231

URL : https://hal.archives-ouvertes.fr/hal-01059858