Independent Checkpointing in a Heterogeneous Grid Environment

Abstract : The EU-funded XtreemOS project implements an open-source grid operating system based on Linux. In order to provide fault tolerance and migration for grid applications, it integrates a distributed grid-checkpointing service called XtreemGCP. This service is designed to support different checkpointing protocols and to address the underlying grid-node checkpointers (e.g. BLCR, LinuxSSI, OpenVZ, etc.) in a transparent manner through a uniform interface. In this paper, we present the integration of an independent checkpointing and rollback-recovery protocol into the XtreemGCP. The solution we propose is not checkpointer bound and thus can be transparently used on top of any grid-node checkpointer. To evaluate the prototype we run it within a heterogeneous environment composed of single-PC nodes and a Single System Image (SSI) cluster. The experimental results demonstrate the capability of the XtreemGCP service to integrate different checkpointing protocols and independently checkpoint a distributed application within a heterogeneous grid environment. Moreover, the performance evaluation also shows that our solution outperforms the existing coordinated checkpointing protocol in terms of scalability.
Complete list of metadatas

Cited literature [3 references]  Display  Hide  Download

https://hal.inria.fr/inria-00521443
Contributor : Eugen Feller <>
Submitted on : Monday, September 27, 2010 - 3:17:20 PM
Last modification on : Tuesday, April 30, 2019 - 3:12:30 PM
Long-term archiving on : Thursday, October 25, 2012 - 4:02:18 PM

File

RR-7399.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : inria-00521443, version 1

Citation

Eugen Feller, John Mehnert-Spahn, Michael Schoettner, Christine Morin. Independent Checkpointing in a Heterogeneous Grid Environment. [Research Report] RR-7399, INRIA. 2010. ⟨inria-00521443⟩

Share

Metrics

Record views

490

Files downloads

492