Fault Tolerance in Cluster Federations with O2P-CF - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2008

Fault Tolerance in Cluster Federations with O2P-CF

Résumé

Fault tolerance is one of the key issues for large scale applications executed on high performance computing systems. In a cluster federation, clusters are gathered to provide huge computing power. To work efficiently on such systems, networks characteristics have to be taken into account: the latency between two nodes of different clusters is much higher than the latency between two nodes of the same cluster. In this paper, we present O2P-CF a message logging protocol well-suited to provide fault tolerance for message passing applications executed on cluster federations. O2P-CF is based on the combination of O2P, an extremely optimistic message logging protocol, with a pessimistic message logging protocol.
Fichier non déposé

Dates et versions

inria-00424025 , version 1 (13-10-2009)

Identifiants

  • HAL Id : inria-00424025 , version 1

Citer

Thomas Ropars, Christine Morin. Fault Tolerance in Cluster Federations with O2P-CF. Resilience 2008, Workshop on Resiliency in High Performance Computing, May 2008, Lyon, France. ⟨inria-00424025⟩
81 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More