Skip to Main content Skip to Navigation
Reports

Iso-Level CAFT: How to Tackle the Combination of Communication Overhead Reduction and Fault Tolerance Scheduling

Abstract : To schedule precedence task graphs in a more realistic framework, we introduce an efficient fault tolerant scheduling algorithm that is both contention-aware and capable of supporting $\varepsilon$ arbitrary fail-silent (fail-stop) processor failures. The design of the proposed algorithm which we call Iso-Level CAFT, is motivated by (i) the search for a better load-balance and (ii) the generation of fewer communications. These goals are achieved by scheduling a chunk of ready tasks simultaneously, which enables for a global view of the potential communications. Our goal is to minimize the total execution time, or latency, while tolerating an arbitrary number of processor failures. Our approach is based on an active replication scheme to mask failures, so that there is no need for detecting and handling such failures. Major achievements include a low complexity, and a drastic reduction of the number of additional communications induced by the replication mechanism. The experimental results fully demonstrate the usefulness of Iso-Level~CAFT.
Complete list of metadata

Cited literature [7 references]  Display  Hide  Download

https://hal.inria.fr/inria-00308794
Contributor : Mourad Hakem <>
Submitted on : Friday, August 1, 2008 - 5:00:40 PM
Last modification on : Friday, June 25, 2021 - 3:40:03 PM
Long-term archiving on: : Thursday, June 3, 2010 - 5:36:25 PM

File

RR-6607.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : inria-00308794, version 1

Collections

Citation

Anne Benoit, Mourad Hakem, Yves Robert. Iso-Level CAFT: How to Tackle the Combination of Communication Overhead Reduction and Fault Tolerance Scheduling. [Research Report] RR-6607, INRIA. 2008, pp.21. ⟨inria-00308794⟩

Share

Metrics

Record views

369

Files downloads

226