HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Journal articles

Reliability of task graph schedules with transient and fail-stop failures: complexity and algorithms

Anne Benoit 1, 2 Louis-Claude Canon 3 Emmanuel Jeannot 4 Yves Robert 1, 2
2 ROMA - Optimisation des ressources : modèles, algorithmes et ordonnancement
Inria Grenoble - Rhône-Alpes, LIP - Laboratoire de l'Informatique du Parallélisme
3 ALGORILLE - Algorithms for the Grid
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
4 RUNTIME - Efficient runtime systems for parallel architectures
Inria Bordeaux - Sud-Ouest, UB - Université de Bordeaux, CNRS - Centre National de la Recherche Scientifique : UMR5800
Abstract : This paper deals with the reliability of task graph schedules with transient and fail-stop failures. While computing the reliability of a given schedule is easy in the absence of task replication, the problem becomes much more difficult when task replication is used. We fill a complexity gap of the scheduling literature: our main result is that this reliability problem is #P'-Complete (hence at least as hard as NP-Complete problems), both for transient and for fail-stop processor failures. We also study the evaluation of a restricted class of schedules, where a task cannot be scheduled before all replicas of all its predecessors have completed their execution. Although the complexity in this case with fail-stop failures remains open, we provide an algorithm to estimate the reliability while limiting evaluation costs, and we validate this approach through simulations.
Complete list of metadata

Contributor : Louis-Claude Canon Connect in order to contact the contributor
Submitted on : Monday, December 19, 2011 - 3:42:30 PM
Last modification on : Monday, May 16, 2022 - 4:46:02 PM
Long-term archiving on: : Friday, November 16, 2012 - 4:00:49 PM


Files produced by the author(s)



Anne Benoit, Louis-Claude Canon, Emmanuel Jeannot, Yves Robert. Reliability of task graph schedules with transient and fail-stop failures: complexity and algorithms. Journal of Scheduling, Springer Verlag, 2012, 15 (5), pp.615-627. ⟨10.1007/s10951-011-0236-y⟩. ⟨hal-00653477⟩



Record views


Files downloads