Cache Coherence Protocol and Memory Performance of the Intel Haswell-EP Architecture, 2015 44th International Conference on Parallel Processing, pp.739-748, 2015. ,
DOI : 10.1109/ICPP.2015.83
A case for NUMA-aware contention management on multicore systems, Proceedings of the 19th international conference on Parallel architectures and compilation techniques, PACT '10, pp.1-1, 2011. ,
DOI : 10.1145/1854273.1854350
Thread and memory placement on numa systems: Asymmetry matters, 2015 USENIX Annual Technical Conference (USENIX ATC 15, pp.277-289, 2015. ,
Instruction-based sampling: A new performance analysis technique for amd family 10h processors, 2007. ,
Using papi for hardware performance monitoring on linux systems, Conference on Linux Clusters: The HPC Revolution, Linux Clusters Institute, 2001. ,
A prototype sampling interface for papi Scientific Advancements Enabled by Enhanced Cyberinfrastructure, ser. XSEDE '15, Proceedings of the 2015 XSEDE Conference, pp.271-298, 2015. ,
DOI : 10.1145/2792745.2792772
Memphis: Finding and fixing NUMA-related performance problems on multi-core platforms, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS), pp.87-96, 2010. ,
DOI : 10.1109/ISPASS.2010.5452060
Memprof: A memory profiler for numa multicore systems, Proceedings of the 2012 USENIX Conference on Annual Technical Conference, ser. USENIX ATC'12, pp.5-5, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00945731
Traffic management: A holistic approach to memory placement on numa systems, Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS '13, pp.381-394, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00945758
A tool to analyze the performance of multithreaded programs on numa architectures, Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ser. PPoPP '14, pp.259-272, 2014. ,
ScaAnalyzer, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '15, pp.1-47, 2015. ,
DOI : 10.1145/1952682.1952688
Performance Monitoring of Throughput Constrained Dataflow Programs Executed On Shared-Memory Multi-core Architectures, Theses, Institut National des Sciences Appliquées de Lyon, 2015. ,
URL : https://hal.archives-ouvertes.fr/tel-01264258
Orcc, Proceedings of the 21st ACM international conference on Multimedia, MM '13, pp.863-866, 2013. ,
DOI : 10.1145/2502081.2502231
URL : https://hal.archives-ouvertes.fr/hal-01059858