Skip to Main content Skip to Navigation
Reports

Scheduling for fault-tolerance: an introduction

Abstract : This report provides an introduction to the design of scheduling algorithms to cope with faults on large-scale parallel platforms. We study \emph{checkpointing} and show how to derive the optimal checkpointing period. Then we explain how to combine checkpointing with \emph{fault prediction}, and discuss how the optimal period is modified when this combination is used. Finally we follow the very same approach for the combination of checkpointing with \emph{replication}.
Document type :
Reports
Complete list of metadatas

https://hal.inria.fr/hal-01393192
Contributor : Equipe Roma <>
Submitted on : Monday, November 7, 2016 - 4:54:29 AM
Last modification on : Thursday, August 22, 2019 - 2:44:01 PM
Long-term archiving on: : Wednesday, February 8, 2017 - 12:26:41 PM

File

rr8971.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01393192, version 1

Citation

Guillaume Aupy, Yves Robert. Scheduling for fault-tolerance: an introduction. [Research Report] RR-8971, INRIA. 2016. ⟨hal-01393192v1⟩

Share

Metrics

Record views

59

Files downloads

67