Hybrid Checkpointing for Parallel Applications in Cluster Federations

Sébastien Monnet 1 Christine Morin 1 Ramamurthy Badrinath 2
1 PARIS - Programming distributed parallel systems for large scale numerical simulation
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, ENS Cachan - École normale supérieure - Cachan, Inria Rennes – Bretagne Atlantique
Abstract : Cluster federations are very useful for applications like large scale code coupling. Faults may appear very frequently, so we want to use checkpoints to be able to restart applications. To take into account the constraints introduced by clusters federation architecture, we propose a hierarchical checkpointing protocol. It uses synchronization inside clusters but only quasi-synchronous methods between clusters. Our protocol has been evaluate by simulation and fits well for applications that can be divided in modules with a lot of communications inside modules but few between them.
Type de document :
Communication dans un congrès
4th IEEE/ACM International Symposium on Cluster Computing and the Grid, Apr 2004, Chicago, IL, United States. IEEE, Proceedings of the 4th IEEE/ACM International Symposium on Cluster Computing and the Grid, 2004
Liste complète des métadonnées

Littérature citée [11 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/inria-00000991
Contributeur : Sébastien Monnet <>
Soumis le : vendredi 4 mars 2016 - 23:30:45
Dernière modification le : jeudi 11 janvier 2018 - 06:20:10
Document(s) archivé(s) le : dimanche 13 novembre 2016 - 07:25:40

Fichier

MonMorBad04CCGrid.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : inria-00000991, version 3

Collections

Citation

Sébastien Monnet, Christine Morin, Ramamurthy Badrinath. Hybrid Checkpointing for Parallel Applications in Cluster Federations. 4th IEEE/ACM International Symposium on Cluster Computing and the Grid, Apr 2004, Chicago, IL, United States. IEEE, Proceedings of the 4th IEEE/ACM International Symposium on Cluster Computing and the Grid, 2004. 〈inria-00000991v3〉

Partager

Métriques

Consultations de la notice

287

Téléchargements de fichiers

39