Skip to Main content Skip to Navigation
Conference papers

READYS: A Reinforcement Learning Based Strategy for Heterogeneous Dynamic Scheduling

Nathan Grinsztajn 1 Olivier Beaumont 2 Emmanuel Jeannot 3 Philippe Preux 1 
1 Scool - Scool
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189
2 HiePACS - High-End Parallel Algorithms for Challenging Numerical Simulations
LaBRI - Laboratoire Bordelais de Recherche en Informatique, Inria Bordeaux - Sud-Ouest
3 TADAAM - Topology-Aware System-Scale Data Management for High-Performance Computing
LaBRI - Laboratoire Bordelais de Recherche en Informatique, Inria Bordeaux - Sud-Ouest
Abstract : In this paper, we propose READYS, a reinforcement learning algorithm for the dynamic scheduling of computations modeled as a Directed Acyclic Graph (DAGs). Our goal is to develop a scheduling algorithm in which allocation and scheduling decisions are made at runtime, based on the state of the system, as performed in runtime systems such as StarPU or ParSEC. Reinforcement Learning is a natural candidate to achieve this task, since its general principle is to build step by step a strategy that, given the state of the system (the state of the resources and a view of the ready tasks and their successors in our case), makes a decision to optimize a global criterion. Moreover, the use of Reinforcement Learning is natural in a context where the duration of tasks (and communications) is stochastic. We propose READYS that combines Graph Convolutional Networks (GCN) with an Actor-Critic Algorithm (A2C): it builds an adaptive representation of the scheduling problem on the fly and learns a scheduling strategy, aiming at minimizing the makespan. A crucial point is that READYS builds a general scheduling strategy which is neither limited to only one specific application or task graph nor one particular problem size, and that can be used to schedule any DAG. We focus on different types of task graphs originating from linear algebra factorization kernels (CHOLESKY, LU, QR) and we consider heterogeneous platforms made of a few CPUs and GPUs. We first propose to analyze the performance of READYS when learning is performed on a given (platform, kernel, problem size) combination. Using simulations, we show that the scheduling agent obtains performances very similar or even superior to algorithms from the literature, and that it is especially powerful when the scheduling environment contains a lot of uncertainty. We additionally demonstrate that our agent exhibits very promising generalization capabilities. To the best of our knowledge, this is the first paper which shows that reinforcement learning can really be used for dynamic DAG scheduling on heterogeneous resources.
Complete list of metadata

https://hal.inria.fr/hal-03313229
Contributor : Emmanuel Jeannot Connect in order to contact the contributor
Submitted on : Tuesday, August 3, 2021 - 3:47:14 PM
Last modification on : Sunday, June 26, 2022 - 4:17:54 AM
Long-term archiving on: : Thursday, November 4, 2021 - 7:03:21 PM

File

cluster.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-03313229, version 1

Collections

Citation

Nathan Grinsztajn, Olivier Beaumont, Emmanuel Jeannot, Philippe Preux. READYS: A Reinforcement Learning Based Strategy for Heterogeneous Dynamic Scheduling. IEEE Cluster 2021, Sep 2021, Portland / Virtual, United States. ⟨hal-03313229⟩

Share

Metrics

Record views

382

Files downloads

319