Obtaining Dynamic Scheduling Policies with Simulation and Machine Learning

Danilo Carastan-Santos 1, 2 Raphael Y. De Camargo 2
1 DATAMOVE - Data Aware Large Scale Computing
Inria Grenoble - Rhône-Alpes, LIG - Laboratoire d'Informatique de Grenoble
Abstract : Dynamic scheduling of tasks in large-scale HPC platforms is normally accomplished using ad-hoc heuristics, based on task characteristics, combined with some backfilling strategy. Defining heuristics that work efficiently in different scenarios is a difficult task, specially when considering the large variety of task types and platform architectures. In this work, we present a methodology based on simulation and machine learning to obtain dynamic scheduling policies. Using simulations and a workload generation model, we can determine the characteristics of tasks that lead to a reduction in the mean slowdown of tasks in an execution queue. Modeling these characteristics using a nonlinear function and applying this function to select the next task to execute in a queue dramatically improved the mean task slowdown in synthetic workloads. When applied to real workload traces from highly different machines, these functions still resulted in important performance improvements, attesting the generalization capability of the obtained heuristics.
Type de document :
Communication dans un congrès
SC'17 -2 International Conference for High Performance Computing, Networking, Storage and Analysis (Supercomputing), Nov 2017, Denver, United States
Liste complète des métadonnées

Littérature citée [24 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01618940
Contributeur : Danilo Carastan dos Santos <>
Soumis le : mercredi 18 octobre 2017 - 17:16:59
Dernière modification le : mercredi 11 avril 2018 - 01:55:01
Document(s) archivé(s) le : vendredi 19 janvier 2018 - 14:04:37

Fichier

paper-hal.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01618940, version 1

Citation

Danilo Carastan-Santos, Raphael Y. De Camargo. Obtaining Dynamic Scheduling Policies with Simulation and Machine Learning. SC'17 -2 International Conference for High Performance Computing, Networking, Storage and Analysis (Supercomputing), Nov 2017, Denver, United States. 〈hal-01618940〉

Partager

Métriques

Consultations de la notice

637

Téléchargements de fichiers

426