Skip to Main content Skip to Navigation
Conference papers

On the complexity of scheduling checkpoints for computational workflows

Yves Robert 1, 2 Frédéric Vivien 1, 2 Dounia Zaidouni 1, 2, *
* Corresponding author
2 ROMA - Optimisation des ressources : modèles, algorithmes et ordonnancement
Inria Grenoble - Rhône-Alpes, LIP - Laboratoire de l'Informatique du Parallélisme
Abstract : This paper deals with the complexity of scheduling computational workflows in the presence of Exponentially distributed failures. When such a failure occurs, rollback and recovery is used so that the execution can resume from the last checkpointed state. The goal is to minimize the expected execution time, and we have to decide in which order to execute the tasks, and whether to checkpoint or not after the completion of each given task. We show that this scheduling problem is strongly NP-complete, and propose a (polynomial-time) dynamic programming algorithm for the case where the application graph is a linear chain. These results lay the theoretical foundations of the problem, and constitute a prerequisite before discussing scheduling strategies for arbitrary DAGS of moldable tasks subject to general failure distributions.
Complete list of metadatas

Cited literature [27 references]  Display  Hide  Download

https://hal.inria.fr/hal-00763382
Contributor : Equipe Roma <>
Submitted on : Monday, December 10, 2012 - 4:21:37 PM
Last modification on : Tuesday, November 19, 2019 - 2:40:15 AM
Long-term archiving on: : Saturday, December 17, 2016 - 11:48:19 PM

File

Complexity-Scheduling-Zaidouni...
Files produced by the author(s)

Identifiers

Collections

Citation

Yves Robert, Frédéric Vivien, Dounia Zaidouni. On the complexity of scheduling checkpoints for computational workflows. FTXS'2012, the Workshop on Fault-Tolerance for HPC at Extreme Scale, in conjunction with the 42nd Annual IEEE/IFIP Int. Conf. on Dependable Systems and Networks (DSN 2012), 2012, Boston, United States. ⟨10.1109/DSNW.2012.6264675⟩. ⟨hal-00763382⟩

Share

Metrics

Record views

475

Files downloads

391