Service interruption on Monday 11 July from 12:30 to 13:00: all the sites of the CCSD (HAL, Epiciences, SciencesConf, AureHAL) will be inaccessible (network hardware connection).
Skip to Main content Skip to Navigation
Conference papers

Scalable Task Parallelism for NUMA: A Uniform Abstraction for Coordinated Scheduling and Memory Management

Andi Drebes 1 Antoniu Pop 1 Karine Heydemann 2 Albert Cohen 3 Nathalie Drach 2 
2 ALSOC - Architecture et Logiciels pour Systèmes Embarqués sur Puce
LIP6 - Laboratoire d'Informatique de Paris 6
3 Parkas - Parallélisme de Kahn Synchrone
DI-ENS - Département d'informatique - ENS Paris, Inria Paris-Rocquencourt, CNRS - Centre National de la Recherche Scientifique : UMR 8548
Abstract : Dynamic task-parallel programming models are popular on shared-memory systems, promising enhanced scalability, load balancing and locality. Yet these promises are undermined by non-uniform memory access (NUMA). We show that using NUMA-aware task and data placement, it is possible to preserve the uniform abstraction of both computing and memory resources for task-parallel programming models while achieving high data locality. Our data placement scheme guarantees that all accesses to task output data target the local memory of the accessing core. The complementary task placement heuristic improves the locality of task input data on a best effort basis. Our algorithms take advantage of data-flow style task parallelism, where the privatization of task data enhances scalability by eliminating false dependences and enabling fine-grained dynamic control over data placement. The algorithms are fully automatic, application-independent, performance-portable across NUMA machines, and adapt to dynamic changes. Placement decisions use information about inter-task data dependences readily available in the run-time system and placement information from the operating system. We achieve 94% of local memory accesses on a 192-core system with 24 NUMA nodes, up to 5× higher performance than NUMA-aware hierarchical work-stealing, and even 5.6× compared to static interleaved allocation. Finally, we show that state-of-the-art dynamic page migration by the operating system cannot catch up with frequent affinity changes between cores and data and thus fails to accelerate task-parallel applications.
Document type :
Conference papers
Complete list of metadata

Cited literature [29 references]  Display  Hide  Download
Contributor : Albert Cohen Connect in order to contact the contributor
Submitted on : Sunday, January 29, 2017 - 5:31:21 PM
Last modification on : Thursday, March 17, 2022 - 10:08:45 AM
Long-term archiving on: : Sunday, April 30, 2017 - 12:25:16 PM


Publisher files allowed on an open archive



Andi Drebes, Antoniu Pop, Karine Heydemann, Albert Cohen, Nathalie Drach. Scalable Task Parallelism for NUMA: A Uniform Abstraction for Coordinated Scheduling and Memory Management. PACT'16 - ACM/IEEE Conference on Parallel Architectures and Compilation Techniques, Sep 2016, Haifa, Israel. pp.125 - 137, ⟨10.1145/2967938.2967946⟩. ⟨hal-01425743⟩



Record views


Files downloads