D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter et al., Weeratunga. 1991. The NAS Parallel Benchmarks, The International Journal of Supercomputer Applications

C. Bienia, S. Kumar, J. Singh, and K. Li, The PARSEC Benchmark Suite: Characterization and Architectural Implications, Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT '08, pp.72-81, 2008.

M. Christopher and . Bishop, Pattern recognition and machine learning, pp.4-11, 2006.

M. Christopher and . Bishop, Pattern recognition and machine learning, pp.179-207, 2006.

S. Blagodurov, S. Zhuravlev, and A. Fedorova, Contentionaware scheduling on multicore systems, ACM Transactions on Computer Systems (TOCS), vol.28, 2010.

L. Breiman, Random forests, Machine learning, vol.45, pp.5-32, 2001.

. Broadwell, , 2014.

F. Broquedis, J. Clet-ortega, S. Moreaud, N. Furmento, B. Goglin et al., hwloc: a Generic Framework for Managing Hardware A nities in HPC Applications, PDP 2010 -The 18th Euromicro International Conference on Parallel, Distributed and Network-Based Computing, 2010.

F. Broquedis, N. Furmento, B. Goglin, P. Wacrenier, and R. Namyst, ForestGOMP: an e cient OpenMP environment for NUMA architectures, International Journal of Parallel Programming, 2010.

M. Castro, L. F. Goes, C. P. Ribeiro, M. Cole, M. Cintra et al., A machine learning-based approach for thread mapping on transactional memory applications, 18th International Conference on High Performance Computing, pp.1-10, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00788791

C. Chang and C. Lin, LIBSVM: a library for support vector machines, ACM transactions on intelligent systems and technology (TIST), vol.2, p.27, 2011.

, The Coral Benchmarks Codes, 2018.

H. M. Eduardo, M. Cruz, . Diener, A. Z. Marco, P. O. Alves et al., Dynamic thread mapping of shared memory applications by exploiting cache coherence protocols, J. Parallel and Distrib. Comput, vol.74, pp.2215-2228, 2014.

H. M. Eduardo, M. Cruz, . Diener, L. Laércio, P. Pilla et al., EagerMap: a task mapping algorithm to improve communication and load balancing in clusters of multicore systems, ACM Transactions on Parallel Computing (TOPC), vol.5, p.17, 2019.

M. Dashti, A. Fedorova, J. Funston, F. Gaud, R. Lachaize et al., Tra c Management: A Holistic Approach to Memory Placement on NUMA Systems, Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '13, pp.381-394, 2013.

A. De, B. , and T. Lundqvist, Addressing characterization methods for memory contention aware co-scheduling, The Journal of Supercomputing, pp.1-33, 2015.

M. Diener, H. M. Eduardo, P. Cruz, and . Navaux, Locality vs. Balance: Exploring data mapping policies on NUMA systems, 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, pp.9-16, 2015.

M. Diener, H. M. Eduardo, P. O. Cruz, A. Navaux, H. Busse et al., Communication-aware Process and Thread Mapping Using Online Communication Detection, Parallel Comput, vol.43, pp.43-63, 2015.

M. Diener, H. M. Eduardo, L. L. Cruz, F. Pilla, P. O. Dupros et al., Characterizing communication and page usage of parallel applications for thread and data mapping. Performance Evaluation, pp.18-36, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01146859

M. Diener, H. M. Eduardo, . Cruz, A. Z. Marco, P. O. Alves et al., A nity-Based Thread and Data Mapping in Shared Memory Systems, ACM Computing Survey, vol.49, p.38, 2016.

V. Klema and A. Laub, The singular value decomposition: Its computation and some applications, IEEE Trans. Automat. Control, vol.25, pp.164-176, 1980.

T. Klug, M. Ott, J. Weidendorfer, and C. Trinitis, autopinautomated optimization of thread-to-core pinning on multicore systems, Transactions on high-performance embedded architectures and compilers III, pp.219-235, 2011.

C. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser et al., Pin: building customized program analysis tools with dynamic instrumentation, Acm sigplan notices, vol.40, pp.190-200, 2005.

Z. Majo and T. R. Gross, Memory Management in NUMA Multicore Systems: Trapped Between Cache Contention and Interconnect Overhead. SIGPLAN Not, vol.46, pp.11-20, 2011.

J. Marathe and F. Mueller, Hardware Pro le-guided Automatic Page Placement for ccNUMA Systems, Proceedings of the Eleventh ACM SIG-PLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '06, pp.90-99, 2006.

J. Marathe, V. Thakkar, and F. Mueller, Feedback-directed Page Placement for ccNUMA via Hardware-generated Memory Traces, J. Parallel and Distrib. Comput, vol.70, pp.1204-1219, 2010.

, Methodology Implementation for Reproducibility of the Paper Experiments, Methodology Implementation, 2019.

J. Philip, S. Mucci, C. Browne, G. Deane, and . Ho, PAPI: A portable interface to hardware performance counters, Proceedings of the department of defense HPCMP users group conference, vol.710, 1999.

P. Radojkovic, V. Cakarevic, J. Verdu, A. Pajuelo, J. Francisco et al., Thread assignment of multithreaded network applications in multicore/multithreaded processors, IEEE Transactions on Parallel and Distributed Systems, vol.24, pp.2513-2525, 2013.

Z. Wang, F. P. Michael, and . O'boyle, Mapping Parallelism to Multi-cores: A Machine Learning Based Approach, Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '09, pp.75-84, 2009.

S. Zhuravlev, S. Blagodurov, and A. Fedorova, Addressing Shared Resource Contention in Multicore Processors via Scheduling, Proceedings of the Fifteenth Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems (ASPLOS XV, pp.129-142, 2010.

S. Zhuravlev, J. C. Saez, S. Blagodurov, A. Fedorova, and M. Prieto, Survey of scheduling techniques for addressing shared resources in multicore processors, ACM Computing Surveys (CSUR), vol.45, p.4, 2012.