Scheduling Computational Workflows on Failure-Prone Platforms

Abstract : We study the scheduling of computational workflows on compute resources that experience exponentially distributed failures. When a failure occurs, roll-back and recovery is used to resume the execution from the last checkpointed state. The scheduling problem is to minimize the expected execution time by deciding in which order to execute the tasks in the workflow and whether to checkpoint or not checkpoint a task after it completes. We give a polynomial-time algorithm for fork graphs and show that the problem is NP-complete with join graphs. Our main result is a polynomial-time algorithm to compute the execution time of a workflow with specified to-be-checkpointed tasks. Using this algorithm as a basis, we propose efficient heuristics for solving the scheduling problem. We evaluate these heuristics for representative workflow configurations.
Type de document :
Communication dans un congrès
17th Workshop on Advances in Parallel and Distributed Computational Models, May 2015, Hyderabad, India. 2015, 2015 International Parallel and Distributed Processing Symposium Workshop (APDCM). 〈10.1109/IPDPSW.2015.33〉
Liste complète des métadonnées

Littérature citée [25 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01251939
Contributeur : Equipe Roma <>
Soumis le : jeudi 7 janvier 2016 - 04:18:25
Dernière modification le : samedi 21 avril 2018 - 01:27:22
Document(s) archivé(s) le : vendredi 8 avril 2016 - 13:08:32

Fichier

apdcm.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Relations

Citation

Guillaume Aupy, Anne Benoit, Henri Casanova, Yves Robert. Scheduling Computational Workflows on Failure-Prone Platforms. 17th Workshop on Advances in Parallel and Distributed Computational Models, May 2015, Hyderabad, India. 2015, 2015 International Parallel and Distributed Processing Symposium Workshop (APDCM). 〈10.1109/IPDPSW.2015.33〉. 〈hal-01251939〉

Partager

Métriques

Consultations de la notice

269

Téléchargements de fichiers

52