Fully-abstracted affinity optimization for task-based models

Jens Gustedt 1 Emmanuel Jeannot 2, 3 Farouk Mansouri 3
1 CAMUS - Compilation pour les Architectures MUlti-coeurS
Inria Nancy - Grand Est, ICube - Laboratoire des sciences de l'ingénieur, de l'informatique et de l'imagerie
3 TADAAM - Topology-Aware System-Scale Data Management for High-Performance Computing
LaBRI - Laboratoire Bordelais de Recherche en Informatique, Inria Bordeaux - Sud-Ouest
Abstract : Task-based models and runtimes are quite popular in the HPC community. They help to implement applications with a high level of abstraction while still applying different types of optimizations. An important optimization target is hardware affinity, which concerns to match application behavior (thread, communication, data) to the architecture topology (cores, caches, memory). In fact, realizing a well adapted placement of threads is a key to achieve performance and scalability, especially on NUMA-SMP machines. However, this type of optimization is difficult: architectures become increasingly complex and application behavior changes with implementations and input parameters, e.g problem size and number of thread. Thus, by themselves task based runtimes often deal badly with this optimization and leave a lot of fine-tuning to the user. In this work, we propose a fully automatic, abstracted and portable affinity module. It produces and implements an optimized affinity strategy that combines knowledge about application characteristics and the architecture’s topology. Implemented in the backend of our task-based runtime ORWL, our approach was used to enhance the performance and the scalability of several unmodified ORWL-coded applications: matrix multiplication, a 2D stencil (Livermore Kernel 23), and a video tracking real world application. On two SGI SMP machines with quite different hardware characteristics, our tests show spectacular performance improvements for this unmodified application code due to a dramatic decrease of cache misses. A comparison to reference implementations using OpenMP confirms this performance gain of almost one order of magnitude.
Complete list of metadatas

Cited literature [16 references]  Display  Hide  Download

Contributor : Jens Gustedt <>
Submitted on : Monday, December 5, 2016 - 4:33:04 PM
Last modification on : Thursday, May 16, 2019 - 6:46:13 PM
Long-term archiving on : Thursday, March 23, 2017 - 12:40:40 AM


Files produced by the author(s)


  • HAL Id : hal-01409101, version 1


Jens Gustedt, Emmanuel Jeannot, Farouk Mansouri. Fully-abstracted affinity optimization for task-based models. [Research Report] RR-8993, INRIA Nancy. 2016. ⟨hal-01409101⟩



Record views


Files downloads