Replication Is More Efficient Than You Think

Anne Benoit; Thomas Hérault; Valentin Le Fèvre; Yves Robert

doi:10.1145/3295500.3356171

Communication Dans Un Congrès Année : 2019

Replication Is More Efficient Than You Think

(1, 2) , (3) , , (2, 1, 3)

1
2
3

Anne Benoit

Fonction : Auteur
PersonId : 182817
IdHAL : anne-benoit
ORCID : 0000-0003-2910-3540
IdRef : 074758438

Optimisation des ressources : modèles, algorithmes et ordonnancement

Laboratoire de l'Informatique du Parallélisme

Thomas Hérault

Fonction : Auteur
PersonId : 954004

Innovative Computing Laboratory [Knoxville]

Valentin Le Fèvre

Fonction : Auteur

Yves Robert

Fonction : Auteur
PersonId : 739318
IdHAL : yves-robert
ORCID : 0000-0003-2361-055X
IdRef : 029813611

Laboratoire de l'Informatique du Parallélisme

Optimisation des ressources : modèles, algorithmes et ordonnancement

Innovative Computing Laboratory [Knoxville]

Résumé

This paper revisits replication coupled with checkpointing for fail-stop errors. Replication enables the application to survive many fail-stop errors , thereby allowing for longer checkpointing periods. Previously published works use replication with the no-restart strategy, which works as follows: (i) compute the application Mean Time To Interruption (MTTI) M as a function of the number of processor pairs and the individual processor Mean Time Between Failures (MTBF); (ii) use checkpointing period T = √ 2M C à la Young/Daly, where C is the checkpoint duration; and (iii) never restart failed processors until the application crashes. We introduce the restart strategy where failed processors are restarted after each checkpoint. We compute the optimal checkpointing period Topt for this strategy, which is much larger than T, thereby decreasing I/O pressure. We show through simulations that using Topt and the restart strategy, instead of T and the usual no-restart strategy, significantly decreases the overhead induced by replication.

Domaines

Informatique [cs] Calcul parallèle, distribué et partagé [cs.DC]

Fichier principal

sc-hal.pdf (672.3 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Equipe Roma : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-02273142

Soumis le : mardi 3 décembre 2019-11:54:09

Dernière modification le : mardi 9 janvier 2024-14:12:45

Archivage à long terme le : mercredi 4 mars 2020-17:06:13

Dates et versions

hal-02273142 , version 1 (03-12-2019)

Licence

Paternité

Identifiants

HAL Id : hal-02273142 , version 1
DOI : 10.1145/3295500.3356171

Citer

Anne Benoit, Thomas Hérault, Valentin Le Fèvre, Yves Robert. Replication Is More Efficient Than You Think. SC 2019 - International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'19), Nov 2019, Denver, United States. pp.1-14, ⟨10.1145/3295500.3356171⟩. ⟨hal-02273142⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-LYON CNRS INRIA UNIV-LYON1 JLESC INRIA2 UDL

66 Consultations

101 Téléchargements

Replication Is More Efficient Than You Think

Résumé

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager