CCK: An Improved Coordinated Checkpoint/Rollback Protocol for Dataflow Applications in KAAPI

Xavier Besseron 1 Samir Jafar 1 Thierry Gautier 1 Jean-Louis Roch 1
1 MOAIS - PrograMming and scheduling design fOr Applications in Interactive Simulation
Inria Grenoble - Rhône-Alpes, LIG - Laboratoire d'Informatique de Grenoble
Abstract : Fault tolerance protocols play an important role in today long runtime scientific parallel applications because the probability of failure may be important due to the number of unreliable components involved during simulation. In this paper we present our approach and preliminary results about a new checkpoint/recovery protocol based on a coordinated scheme. This protocol is highly coupled to the availability of an abstract representation of the execution.
Complete list of metadatas

https://hal.inria.fr/hal-00684864
Contributor : Ist Rennes <>
Submitted on : Tuesday, April 3, 2012 - 1:24:36 PM
Last modification on : Friday, October 12, 2018 - 1:17:57 AM

Links full text

Identifiers

Collections

Citation

Xavier Besseron, Samir Jafar, Thierry Gautier, Jean-Louis Roch. CCK: An Improved Coordinated Checkpoint/Rollback Protocol for Dataflow Applications in KAAPI. ICTTA'06 IEEE Conference on Information and Communication Technologies: from Theory to Applications, Apr 2006, Damascus, Syria. ⟨10.1109/ICTTA.2006.1684955⟩. ⟨hal-00684864⟩

Share

Metrics

Record views

228