Iso-Level CAFT: How to Tackle the Combination of Communication Overhead Reduction and Fault Tolerance Scheduling

Abstract : To schedule precedence task graphs in a more realistic framework, we introduce an efficient fault tolerant scheduling algorithm that is both contention-aware and capable of supporting $\varepsilon$ arbitrary fail-silent (fail-stop) processor failures. The design of the proposed algorithm which we call Iso-Level CAFT, is motivated by (i) the search for a better load-balance and (ii) the generation of fewer communications. These goals are achieved by scheduling a chunk of ready tasks simultaneously, which enables for a global view of the potential communications. Our goal is to minimize the total execution time, or latency, while tolerating an arbitrary number of processor failures. Our approach is based on an active replication scheme to mask failures, so that there is no need for detecting and handling such failures. Major achievements include a low complexity, and a drastic reduction of the number of additional communications induced by the replication mechanism. The experimental results fully demonstrate the usefulness of Iso-Level~CAFT.
Type de document :
Rapport
[Research Report] RR-6607, INRIA. 2008, pp.21
Liste complète des métadonnées

Littérature citée [7 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/inria-00308794
Contributeur : Mourad Hakem <>
Soumis le : vendredi 1 août 2008 - 17:00:40
Dernière modification le : mardi 16 janvier 2018 - 15:36:39
Document(s) archivé(s) le : jeudi 3 juin 2010 - 17:36:25

Fichier

RR-6607.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : inria-00308794, version 1

Collections

Citation

Anne Benoit, Mourad Hakem, Yves Robert. Iso-Level CAFT: How to Tackle the Combination of Communication Overhead Reduction and Fault Tolerance Scheduling. [Research Report] RR-6607, INRIA. 2008, pp.21. 〈inria-00308794〉

Partager

Métriques

Consultations de la notice

290

Téléchargements de fichiers

55