hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, pp.180-186, 2010. ,
DOI : 10.1109/PDP.2010.67
URL : https://hal.archives-ouvertes.fr/inria-00429889
ForestGOMP: An Efficient OpenMP Environment for NUMA Architectures, International Journal of Parallel Programming, vol.62, issue.5-6, pp.418-439, 2010. ,
DOI : 10.1007/s10766-010-0136-3
URL : https://hal.archives-ouvertes.fr/inria-00496295
Cache-Efficient, Intranode, Large-Message MPI Communication with MPICH2-Nemesis, 2009 International Conference on Parallel Processing, pp.462-469, 2009. ,
DOI : 10.1109/ICPP.2009.22
URL : https://hal.archives-ouvertes.fr/inria-00390064
Versatile, scalable, and accurate simulation of distributed applications and platforms, Journal of Parallel and Distributed Computing, vol.74, issue.10, pp.742899-2917, 2014. ,
DOI : 10.1016/j.jpdc.2014.06.008
URL : https://hal.archives-ouvertes.fr/hal-01017319
MPIPP, Proceedings of the 20th annual international conference on Supercomputing , ICS '06, pp.353-360, 2006. ,
DOI : 10.1145/1183401.1183451
LogP: Towards a Realistic Model of Parallel Computation, Principles Practice of Parallel Programming, pp.1-12, 1993. ,
Euro-Par 2014: Parallel Processing Workshops: Euro-Par 2014 International Workshops Revised Selected Papers, Part II, chapter Exploiting Hidden Non-uniformity of Uniform Memory Access on Manycore CPUs, pp.242-253, 2014. ,
Netloc: Towards a Comprehensive View of the HPC System Topology, 2014 43rd International Conference on Parallel Processing Workshops, pp.216-225, 2014. ,
DOI : 10.1109/ICPPW.2014.38
URL : https://hal.archives-ouvertes.fr/hal-01010599
Dodging Non-uniform I/O Access in Hierarchical Collective Operations for Multicore Clusters, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, pp.788-794, 2011. ,
DOI : 10.1109/IPDPS.2011.222
URL : https://hal.archives-ouvertes.fr/inria-00566246
Automatic mapping of parallel applications on multicore architectures using the Servet benchmark suite, Computers & Electrical Engineering, vol.38, issue.2, pp.258-269, 2012. ,
DOI : 10.1016/j.compeleceng.2011.12.007
Generic topology mapping strategies for large-scale parallel architectures, Proceedings of the international conference on Supercomputing, ICS '11, pp.75-85, 2011. ,
DOI : 10.1145/1995896.1995909
Advancing application process affinity experimentation, Proceedings of the 20th European MPI Users' Group Meeting on, EuroMPI '13, pp.163-168, 2013. ,
DOI : 10.1145/2488551.2488603
Process Placement in Multicore Clusters:Algorithmic Issues and Practical Techniques, IEEE Transactions on Parallel and Distributed Systems, vol.25, issue.4, pp.993-1002 ,
DOI : 10.1109/TPDS.2013.104
URL : https://hal.archives-ouvertes.fr/hal-00803548
Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture, Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT '04), pp.111-122, 2004. ,
Locality and Topology Aware Intra-node Communication among Multicore CPUs, Proceedings of the 17th European MPI Users Group Conference, number 6305 in Lecture Notes in Computer Science, pp.265-274, 2010. ,
DOI : 10.1007/978-3-642-15646-5_28
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.180.4377
Impact of NUMA Effects on High-Speed Networking with Multi-Opteron Machines, Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems, pp.24-29, 2007. ,
URL : https://hal.archives-ouvertes.fr/inria-00175747
Adaptive MPI Multirail Tuning for Non-uniform Input/Output Access, Lecture Notes in Computer Science, vol.6305, pp.239-248, 2010. ,
DOI : 10.1007/978-3-642-15646-5_25
URL : https://hal.archives-ouvertes.fr/inria-00486178
Scotch and libScotch 5.1 User's Guide, 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-00410332
A benchmark-based performance model for memory-bound HPC applications, 2014 International Conference on High Performance Computing & Simulation (HPCS), pp.943-950, 2014. ,
DOI : 10.1109/HPCSim.2014.6903790
URL : https://hal.archives-ouvertes.fr/hal-00985598
Multi-core aware process mapping and its impact on communication overhead of parallel applications, 2009 IEEE Symposium on Computers and Communications, pp.811-817, 2009. ,
DOI : 10.1109/ISCC.2009.5202271
Feedback-directed thread scheduling with memory considerations, Proceedings of the 16th international symposium on High performance distributed computing , HPDC '07, pp.97-106, 2007. ,
DOI : 10.1145/1272366.1272380
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.110.8567
Analytical modeling and optimization for affinity based thread scheduling on multicore systems, 2009 IEEE International Conference on Cluster Computing and Workshops, 2009. ,
DOI : 10.1109/CLUSTR.2009.5289173
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.535.5485
Using locality information in userlevel scheduling, 1995. ,
Design of a scalable InfiniBand topology service to enable network-topology-aware placement of processes, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, 2012. ,
DOI : 10.1109/SC.2012.47
The importance of data locality in distributed computing applications, NSF Workflow Workshop, 2006. ,
LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments, 2010 39th International Conference on Parallel Processing Workshops, pp.207-216, 2010. ,
DOI : 10.1109/ICPPW.2010.38
URL : http://arxiv.org/abs/1004.4431