Independent Checkpointing in a Heterogeneous Grid Environment

Abstract : The EU-funded XtreemOS project implements an open-source grid operating system based on Linux. In order to provide fault tolerance and migration for grid applications, it integrates a distributed grid-checkpointing service called XtreemGCP. This service is designed to support different checkpointing protocols and to address the underlying grid-node checkpointers (e.g. BLCR, LinuxSSI, OpenVZ, etc.) in a transparent manner through a uniform interface. In this paper, we present the integration of an independent checkpointing and rollback-recovery protocol into the XtreemGCP. The solution we propose is not checkpointer bound and thus can be transparently used on top of any grid-node checkpointer. To evaluate the prototype we run it within a heterogeneous environment composed of single-PC nodes and a Single System Image (SSI) cluster. The experimental results demonstrate the capability of the XtreemGCP service to integrate different checkpointing protocols and independently checkpoint a distributed application within a heterogeneous grid environment. Moreover, the performance evaluation also shows that our solution outperforms the existing coordinated checkpointing protocol in terms of scalability.
Document type :
Journal articles
Future Generation Computer Systems, Elsevier, 2012, 28 (1), pp.163-170. 〈10.1016/j.future.2011.03.012〉
Liste complète des métadonnées
Contributor : Eugen Feller <>
Submitted on : Monday, July 4, 2011 - 6:04:31 PM
Last modification on : Friday, November 16, 2018 - 1:40:32 AM

Links full text



Eugen Feller, John Mehnert-Spahn, Michael Schoettner, Christine Morin. Independent Checkpointing in a Heterogeneous Grid Environment. Future Generation Computer Systems, Elsevier, 2012, 28 (1), pp.163-170. 〈10.1016/j.future.2011.03.012〉. 〈inria-00605914〉



Record views