Obtaining Dynamic Scheduling Policies with Simulation and Machine Learning

Danilo Carastan-Santos; Raphael Yokoingawa de Camargo

Communication Dans Un Congrès Année : 2017

Obtaining Dynamic Scheduling Policies with Simulation and Machine Learning

(1, 2) , (2)

1
2

Danilo Carastan-Santos

Fonction : Auteur
PersonId : 1021121
IdHAL : danilo-carastan-santos
ORCID : 0000-0002-1878-8137

Data Aware Large Scale Computing

Universidade Federal do ABC = Federal University of ABC = Université Fédérale de l'ABC [Brazil]

Raphael Yokoingawa de Camargo

Fonction : Auteur
PersonId : 1021122

Universidade Federal do ABC = Federal University of ABC = Université Fédérale de l'ABC [Brazil]

Résumé

Dynamic scheduling of tasks in large-scale HPC platforms is normally accomplished using ad-hoc heuristics, based on task characteristics, combined with some backfilling strategy. Defining heuristics that work efficiently in different scenarios is a difficult task, specially when considering the large variety of task types and platform architectures. In this work, we present a methodology based on simulation and machine learning to obtain dynamic scheduling policies. Using simulations and a workload generation model, we can determine the characteristics of tasks that lead to a reduction in the mean slowdown of tasks in an execution queue. Modeling these characteristics using a nonlinear function and applying this function to select the next task to execute in a queue dramatically improved the mean task slowdown in synthetic workloads. When applied to real workload traces from highly different machines, these functions still resulted in important performance improvements, attesting the generalization capability of the obtained heuristics.

Mots clés

Scheduling High Performance Computing Machine Learning

Domaines

Calcul parallèle, distribué et partagé [cs.DC]

Fichier principal

paper-hal.pdf (3.23 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Danilo CARASTAN DOS SANTOS : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01618940

Soumis le : mercredi 18 octobre 2017-17:16:59

Dernière modification le : jeudi 4 avril 2024-18:23:01

Archivage à long terme le : vendredi 19 janvier 2018-14:04:37

Dates et versions

hal-01618940 , version 1 (18-10-2017)

Identifiants

HAL Id : hal-01618940 , version 1

Citer

Danilo Carastan-Santos, Raphael Yokoingawa de Camargo. Obtaining Dynamic Scheduling Policies with Simulation and Machine Learning. SC'17 -2 International Conference for High Performance Computing, Networking, Storage and Analysis (Supercomputing), Nov 2017, Denver, United States. ⟨hal-01618940⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS INRIA LIG LIG_SRCPR INRIA2 LIG-SRCPR-DATAMOVE LIG_SIDCH

689 Consultations

1569 Téléchargements

Obtaining Dynamic Scheduling Policies with Simulation and Machine Learning

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager