(memory=256G) Package:1 L3Cache:1(size=20M) L2Cache:8(size=256K) L1dCache:1(size=32K) Core, pp.121-123 ,
The importance of data locality in distributed computing applications, NSF Workflow Workshop, 2006. ,
Using locality information in userlevel scheduling, p.91058 ,
Impact of NUMA Effects on High- Speed Networking with Multi-Opteron Machines, Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems, pp.24-29, 2007. ,
URL : https://hal.archives-ouvertes.fr/inria-00175747
Feedback-directed thread scheduling with memory considerations, Proceedings of the 16th international symposium on High performance distributed computing , HPDC '07, pp.97-106, 2007. ,
DOI : 10.1145/1272366.1272380
Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture, Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT '04) ,
hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, pp.180-186, 2010. ,
DOI : 10.1109/PDP.2010.67
URL : https://hal.archives-ouvertes.fr/inria-00429889
Advancing application process affinity experimentation, Proceedings of the 20th European MPI Users' Group Meeting on, EuroMPI '13, pp.163-168, 2013. ,
DOI : 10.1145/2488551.2488603
Locality and Topology Aware Intra-node Communication among Multicore CPUs, Proceedings of the 17th European MPI Users Group Conference, pp.265-274, 2010. ,
DOI : 10.1007/978-3-642-15646-5_28
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.180.4377
Adaptive MPI Multirail Tuning for Non-Uniform Input/Output Access, " in Recent Advances in the Message Passing Interface. The 17th European MPI User's Group Meeting, ser. Lecture Notes in Computer Science, 2010. ,
The Design of OpenMP Thread Affinity, OpenMP in a Heterogeneous World - 8th International Workshop on OpenMP, 2012. ,
DOI : 10.1007/978-3-642-30961-8_2
LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments, 2010 39th International Conference on Parallel Processing Workshops, pp.207-216, 2010. ,
DOI : 10.1109/ICPPW.2010.38
Analytical modeling and optimization for affinity based thread scheduling on multicore systems, 2009 IEEE International Conference on Cluster Computing and Workshops, 2009. ,
DOI : 10.1109/CLUSTR.2009.5289173
A benchmark-based performance model for memory-bound HPC applications, 2014 International Conference on High Performance Computing & Simulation (HPCS), pp.943-950, 2014. ,
DOI : 10.1109/HPCSim.2014.6903790
URL : https://hal.archives-ouvertes.fr/hal-00985598
LogP: Towards a Realistic Model of Parallel Computation, Principles Practice of Parallel Programming, pp.1-12, 1993. ,
MPIPP, Proceedings of the 20th annual international conference on Supercomputing , ICS '06, pp.353-360, 2006. ,
DOI : 10.1145/1183401.1183451
Versatile, scalable, and accurate simulation of distributed applications and platforms, Journal of Parallel and Distributed Computing, vol.74, issue.10, pp.2899-2917, 2014. ,
DOI : 10.1016/j.jpdc.2014.06.008
URL : https://hal.archives-ouvertes.fr/hal-01017319
Multi-core aware process mapping and its impact on communication overhead of parallel applications, 2009 IEEE Symposium on Computers and Communications, pp.811-817, 2009. ,
DOI : 10.1109/ISCC.2009.5202271
Automatic mapping of parallel applications on multicore architectures using the Servet benchmark suite, Computers & Electrical Engineering, vol.38, issue.2, pp.258-269, 2012. ,
DOI : 10.1016/j.compeleceng.2011.12.007
ForestGOMP: An Efficient OpenMP Environment for NUMA Architectures, International Journal of Parallel Programming, vol.62, issue.5-6, pp.418-439, 2010. ,
DOI : 10.1007/s10766-010-0136-3
URL : https://hal.archives-ouvertes.fr/inria-00496295
Process Placement in Multicore Clusters:Algorithmic Issues and Practical Techniques, IEEE Transactions on Parallel and Distributed Systems, vol.25, issue.4, pp.993-1002 ,
DOI : 10.1109/TPDS.2013.104
URL : https://hal.archives-ouvertes.fr/hal-00803548
DAGuE: A generic distributed DAG engine for High Performance Computing, extensions for Next-Generation Parallel Programming Models, pp.37-51, 2012. ,
DOI : 10.1016/j.parco.2011.10.003
Managing the topology of heterogeneous cluster nodes with hardware locality (hwloc), 2014 International Conference on High Performance Computing & Simulation (HPCS), pp.74-81, 2014. ,
DOI : 10.1109/HPCSim.2014.6903671
URL : https://hal.archives-ouvertes.fr/hal-00985096
Exposing the Locality of Heterogeneous Memory Architectures to HPC Applications, Proceedings of the Second International Symposium on Memory Systems , MEMSYS '16, pp.30-39, 2016. ,
DOI : 10.1145/2989081.2989115
URL : https://hal.archives-ouvertes.fr/hal-01330194