The Architecture of the XtreemOS Grid Checkpointing Service

John Mehnert-Spahn 1, * Thomas Ropars 2, * Michael Schoettner 1 Christine Morin 2
* Auteur correspondant
2 PARIS - Programming distributed parallel systems for large scale numerical simulation
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, ENS Cachan - École normale supérieure - Cachan, Inria Rennes – Bretagne Atlantique
Abstract : The EU-funded XtreemOS project implements a grid operating system (OS) transparently exploiting distributed resources through the SAGA and POSIX interfaces. XtreemOS uses an integrated grid checkpointing service (XtreemGCP) for implementing migration and fault tolerance. Checkpointing and restarting applications in a grid requires saving and restoring applications in a distributed heterogeneous environment. The latter may spawn millions of grid nodes using different system-specific checkpointers saving and restoring application and kernel data structures on a grid node. In this paper we present the architecture of the XtreemGCP service integrating existing checkpointing solutions. Our architecture is open to support different checkpointing strategies that can be adapted according to evolving failure situations or changing application requirements. We propose to bridge the gap between grid semantics and system-specific checkpointers by introducing a common kernel checkpointer API that allows using different checkpointers in a uniform way. Furthermore, we discuss other grid related checkpointing issues including resource conflicts during restart, security, and checkpoint file management. Although this paper presents a solution within the XtreemOS context it can be applied to any other grid middleware or distributed OS, too.
Type de document :
Rapport
[Research Report] RR-6772, INRIA. 2008, pp.19
Liste complète des métadonnées

Littérature citée [23 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/inria-00346955
Contributeur : Thomas Ropars <>
Soumis le : vendredi 12 décembre 2008 - 18:43:55
Dernière modification le : jeudi 11 janvier 2018 - 06:20:10
Document(s) archivé(s) le : mardi 8 juin 2010 - 16:09:59

Fichier

RR-6772.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : inria-00346955, version 1

Citation

John Mehnert-Spahn, Thomas Ropars, Michael Schoettner, Christine Morin. The Architecture of the XtreemOS Grid Checkpointing Service. [Research Report] RR-6772, INRIA. 2008, pp.19. 〈inria-00346955〉

Partager

Métriques

Consultations de la notice

364

Téléchargements de fichiers

288