Reliability of task graph schedules with transient and fail-stop failures: complexity and algorithms

Anne Benoit 1, 2 Louis-Claude Canon 3 Emmanuel Jeannot 4 Yves Robert 1, 2
2 ROMA - Optimisation des ressources : modèles, algorithmes et ordonnancement
Inria Grenoble - Rhône-Alpes, LIP - Laboratoire de l'Informatique du Parallélisme
3 ALGORILLE - Algorithms for the Grid
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
4 RUNTIME - Efficient runtime systems for parallel architectures
Inria Bordeaux - Sud-Ouest, UB - Université de Bordeaux, CNRS - Centre National de la Recherche Scientifique : UMR5800
Abstract : This paper deals with the reliability of task graph schedules with transient and fail-stop failures. While computing the reliability of a given schedule is easy in the absence of task replication, the problem becomes much more difficult when task replication is used. We fill a complexity gap of the scheduling literature: our main result is that this reliability problem is #P'-Complete (hence at least as hard as NP-Complete problems), both for transient and for fail-stop processor failures. We also study the evaluation of a restricted class of schedules, where a task cannot be scheduled before all replicas of all its predecessors have completed their execution. Although the complexity in this case with fail-stop failures remains open, we provide an algorithm to estimate the reliability while limiting evaluation costs, and we validate this approach through simulations.
Type de document :
Article dans une revue
Journal of Scheduling, Springer Verlag, 2012, 15 (5), pp.615-627. 〈10.1007/s10951-011-0236-y〉
Liste complète des métadonnées

https://hal.inria.fr/hal-00653477
Contributeur : Louis-Claude Canon <>
Soumis le : lundi 19 décembre 2011 - 15:42:30
Dernière modification le : mercredi 11 avril 2018 - 01:55:17
Document(s) archivé(s) le : vendredi 16 novembre 2012 - 16:00:49

Fichier

JoS_rev1.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Anne Benoit, Louis-Claude Canon, Emmanuel Jeannot, Yves Robert. Reliability of task graph schedules with transient and fail-stop failures: complexity and algorithms. Journal of Scheduling, Springer Verlag, 2012, 15 (5), pp.615-627. 〈10.1007/s10951-011-0236-y〉. 〈hal-00653477〉

Partager

Métriques

Consultations de la notice

372

Téléchargements de fichiers

244