Transparent Message-Passing Parallel Applications Checkpointing in Kerrighed - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Rapport (Rapport De Recherche) Année : 2005

Transparent Message-Passing Parallel Applications Checkpointing in Kerrighed

Résumé

Nowadays, clusters are widely used to execute scientific applications. These applications are often message-passing parallel applications with long execution time. Since the number of nodes in clusters is growing, the probability of a node failure during the execution of an application increases and the application execution time may be greater than the cluster mean time between failures (MTBF). To avoid restarting application from the beginning, some fault tolerant mechanisms such as checkpoint/restart are needed. Currently, checkpoint/restart mechanisms are either implemented directly in the application source code by applications programmers or are integrated in communication environments such as MPI or PVM. We propose in this paper a new approach in which checkpoint/restart mechanisms for parallel applications are implemented in a cluster single system image operating system. While this kernel level approach is more complex to implement than other approaches, it is more general because it does not require any modification, compilation or relinking of the applications whatever the communication environment they rely on. Our approach has been implemented in single system image operating system based on. Performance results are presented in this paper.
Fichier principal
Vignette du fichier
RR-5755.pdf (250.2 Ko) Télécharger le fichier

Dates et versions

inria-00070265 , version 1 (19-05-2006)

Identifiants

  • HAL Id : inria-00070265 , version 1

Citer

Matthieu Fertré, Christine Morin. Transparent Message-Passing Parallel Applications Checkpointing in Kerrighed. [Research Report] RR-5755, INRIA. 2005, pp.13. ⟨inria-00070265⟩
102 Consultations
76 Téléchargements

Partager

Gmail Facebook X LinkedIn More