HPCTOOLKIT: tools for performance analysis of optimized parallel programs, Concurrency and Computation: Practice and Experience, p.685701, 2010. ,
DOI : http://doi.acm.org/10.1145/1654059.1654111
HMTT: A Platform Independent Full-system Memory Trace Monitoring System, Proceedings of the 2008 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS '08, p.229240, 2008. ,
Rivet, ACM SIGGRAPH Computer Graphics, vol.34, issue.1, p.6873, 2000. ,
DOI : 10.1145/563788.604455
hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, p.180186, 2010. ,
DOI : 10.1109/PDP.2010.67
URL : https://hal.archives-ouvertes.fr/inria-00429889
Trac Management: A Holistic Approach to Memory Placement on NUMA Systems, Architectural Support for Programming Languages and Operating Systems (ASPLOS), p.381393, 2013. ,
SIGMA: A Simulator Infrastructure to Guide Memory Analysis, ACM/IEEE SC 2002 Conference (SC'02) ,
DOI : 10.1109/SC.2002.10055
The Hardware Performance Monitor Toolkit, Euro-Par 2001 Parallel Processing, p.122132 ,
Locality vs. Balance: Exploring Data Mapping Policies on NUMA Systems, 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, p.916, 2015. ,
DOI : 10.1109/PDP.2015.11
kMAF, Proceedings of the 23rd international conference on Parallel architectures and compilation, PACT '14, 2014. ,
DOI : 10.1145/2628071.2628085
What every programmer should know about memory, 2007. ,
Instruction-based sampling: A new performance analysis technique for AMD family 10h processors, 2007. ,
An introduction to analysis and optimization with AMD CodeAnalyst Performance Analyzer, 2008. ,
Exploiting Intensive Multithreading for the Ecient Simulation of 3D Seismic Wave Propagation, IEEE International Conference on Computational Science and Engineering (CSE), p.253260, 2008. ,
Dissecting On-Node Memory Access Performance: A Semantic Approach, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis, p.166176, 2014. ,
DOI : 10.1109/SC.2014.19
Intel Performance Counter Monitor -A better way to measure CPU utilization, 2012. ,
Understanding the behavior of in-memory computing workloads, 2014 IEEE International Symposium on Workload Characterization (IISWC), p.2230, 2014. ,
DOI : 10.1109/IISWC.2014.6983036
The OpenMP implementation of NAS Parallel Benchmarks and Its Performance, 1999. ,
MemProf: A Memory Proler for NUMA Multicore Systems, USENIX 2012 Annual Technical Conference (USENIX ATC 12), pp.5364-2012 ,
A Tool to Analyze the Performance of Multithreaded Programs on NUMA Architectures, Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '14, p.259272, 2014. ,
anity-on-next-touch: Increasing the Performance of an Industrial PDE Solver on a cc-NUMA System, International Conference on Supercomputing (SC), p.387392, 2005. ,
Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation, Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '05, p.190200, 2005. ,
(Mis)understanding the NUMA memory system performance of multithreaded workloads, 2013 IEEE International Symposium on Workload Characterization (IISWC), p.1122, 2013. ,
DOI : 10.1109/IISWC.2013.6704666
Using simple page placement policies to reduce the cost of cache lls in coherent shared-memory systems, International Parallel Processing Symposium (IPPS), p.480485, 1995. ,
MemSpy: Analyzing Memory System Bottlenecks in Programs, Proceedings of the 1992 ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS '92/PERFORMANCE '92, p.112, 1992. ,
Memphis: Finding and fixing NUMA-related performance problems on multi-core platforms, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS), p.8796, 2010. ,
DOI : 10.1109/ISPASS.2010.5452060
Compiler support for selective page migration in NUMA architectures, Proceedings of the 23rd international conference on Parallel architectures and compilation, PACT '14, p.369380, 2014. ,
DOI : 10.1145/2628071.2628077
VTune performance analyzer essentials, 2005. ,
Memory Anity for Hierarchical Shared Memory Multiprocessors, International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), p.5966, 2009. ,
Visualizing the Memory Access Behavior of Shared Memory Applications on NUMA Architectures, VassilN ,
DOI : 10.1007/3-540-45718-6_91
Visualization of Memory Access Behavior on Hierarchical NUMA Architectures, 2014 First Workshop on Visual Performance Analysis, p.4249, 2014. ,
DOI : 10.1109/VPA.2014.12
Inovallée 655 avenue de l'Europe Montbonnot 38334 Saint Ismier Cedex Publisher Inria Domaine de Voluceau -Rocquencourt BP 105 -78153 Le Chesnay Cedex inria, pp.249-6399 ,