Skip to Main content Skip to Navigation

Locality-Aware Scheduling of Independant Tasks for Runtime Systems

Maxime Gonthier 1 Loris Marchal 1 Samuel Thibault 2 
1 ROMA - Optimisation des ressources : modèles, algorithmes et ordonnancement
Inria Grenoble - Rhône-Alpes, LIP - Laboratoire de l'Informatique du Parallélisme
2 STORM - STatic Optimizations, Runtime Methods
LaBRI - Laboratoire Bordelais de Recherche en Informatique, Inria Bordeaux - Sud-Ouest
Abstract : A now-classical way of meeting the increasing demand for computing speed by HPC applications is the use of GPUs and/or other accelerators. Such accelerators have their own memory, which is usually quite limited, and are connected to the main memory through a bus with bounded bandwidth. Thus, a particular care should be devoted to data locality in order to avoid unnecessary data movements. Task-based runtime schedulers have emerged as a convenient and efficient way to use such heterogeneous platforms. When processing an application, the scheduler has the knowledge of all tasks available for processing on a GPU, as well as their input data dependencies. Hence, it is able to order tasks and prefetch their input data in the GPU memory (after possibly evicting some previously-loaded data), while aiming at minimizing data movements, so as to reduce the total processing time. In this paper, we focus on how to schedule tasks that share some of their input data (but are otherwise independent) on a GPU. We provide a formal model of the problem, exhibit an optimal eviction strategy, and show that ordering tasks to minimize data movement is NP-complete. We review and adapt existing ordering strategies to this problem, and propose a new one based on task aggregation. These strategies have been implemented in the StarPU runtime system. We present their performance on tasks from tiled 2D, 3D matrix products, Cholesky factorization and randomized 2D matrix operation. Our experiments demonstrate that using our new strategy together with the optimal eviction policy reduces the amount of data movement as well as the total processing time.
Complete list of metadata
Contributor : Equipe Roma Connect in order to contact the contributor
Submitted on : Tuesday, August 31, 2021 - 2:55:32 PM
Last modification on : Friday, July 1, 2022 - 3:51:46 AM


Files produced by the author(s)


  • HAL Id : hal-03144290, version 7


Maxime Gonthier, Loris Marchal, Samuel Thibault. Locality-Aware Scheduling of Independant Tasks for Runtime Systems. [Research Report] RR-9394, Inria Grenoble -Rhône-Alpes. 2021, pp.21. ⟨hal-03144290v7⟩



Record views


Files downloads