Skip to Main content Skip to Navigation
Conference papers

Automatic, Abstracted and Portable Topology-Aware Thread Placement

Abstract : Efficiently programming shared-memory machines is a difficult challenge because mapping application threads onto the memory hierarchy has a strong impact on the performance. However, optimizing such thread placement is difficult: architectures become increasingly complex and application behavior changes with implementations and input parameters, e.g problem size and number of threads. In this work, we propose a fully automatic, abstracted and portable affinity module. It produces and implements an optimized affinity strategy that combines knowledge about application characteristics and the platform topology. Implemented in the back-end of our runtime system (ORWL), our approach was used to enhance the performance and the scalability of several unmodified ORWL-coded applications: matrix multiplication, a 2D stencil (Livermore Kernel 23), and a video tracking real world application. On two SMP machines with quite different hardware characteristics, our tests show spectacular performance improvements for these unmodified application codes due to a dramatic decrease of cache misses and pipeline stalls. A comparison to reference implementations using OpenMP confirms this performance gain of almost one order of magnitude.
Complete list of metadatas

Cited literature [15 references]  Display  Hide  Download
Contributor : Farouk Mansouri <>
Submitted on : Wednesday, October 25, 2017 - 9:30:22 PM
Last modification on : Thursday, May 16, 2019 - 6:46:13 PM
Document(s) archivé(s) le : Friday, January 26, 2018 - 12:23:13 PM


Files produced by the author(s)



Jens Gustedt, Emmanuel Jeannot, Farouk Mansouri. Automatic, Abstracted and Portable Topology-Aware Thread Placement. IEEE Cluster, Sep 2017, Hawaï, United States. pp.389 - 399, ⟨10.1109/CLUSTER.2017.71⟩. ⟨hal-01621936⟩



Record views


Files downloads