Checkpointing Strategies for Scheduling Computational Workflows

Abstract : We study the scheduling of computational workflows on compute resources that experience exponentially distributed failures. When a failure occurs, rollback and recovery is used to resume the execution from the last checkpointed state. The scheduling problem is to minimize the expected execution time by deciding in which order to execute the tasks in the workflow and deciding for each task whether to checkpoint it or not after it completes. We give a polynomial-time optimal algorithm for fork DAGs (Directed Acyclic Graphs) and show that the problem is NP-complete with join DAGs. We also investigate the complexity of the simple case in which no task is checkpointed. Our main result is a polynomial-time algorithm to compute the expected execution time of a workflow, with a given task execution order and specified to-be-checkpointed tasks. Using this algorithm as a basis, we propose several heuristics for solving the scheduling problem. We evaluate these heuristics for representative workflow configurations.
Type de document :
Article dans une revue
International Journal of Networking and Computing, Higashi Hiroshima : Dept. of Computer Engineering, Hiroshima University, 2016, 6 (1), pp.2-26. 〈10.15803/ijnc.6.1_2〉
Liste complète des métadonnées

Littérature citée [25 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01354874
Contributeur : Equipe Roma <>
Soumis le : mardi 23 août 2016 - 10:01:25
Dernière modification le : vendredi 20 avril 2018 - 15:44:27
Document(s) archivé(s) le : jeudi 24 novembre 2016 - 12:38:55

Fichier

ijnc-final.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Guillaume Aupy, Anne Benoit, Henri Casanova, Yves Robert. Checkpointing Strategies for Scheduling Computational Workflows. International Journal of Networking and Computing, Higashi Hiroshima : Dept. of Computer Engineering, Hiroshima University, 2016, 6 (1), pp.2-26. 〈10.15803/ijnc.6.1_2〉. 〈hal-01354874〉

Partager

Métriques

Consultations de la notice

390

Téléchargements de fichiers

68