CCK: An Improved Coordinated Checkpoint/Rollback Protocol for Dataflow Applications in KAAPI

Xavier Besseron 1 Samir Jafar 1 Thierry Gautier 1 Jean-Louis Roch 1
1 MOAIS - PrograMming and scheduling design fOr Applications in Interactive Simulation
Inria Grenoble - Rhône-Alpes, LIG - Laboratoire d'Informatique de Grenoble
Abstract : Fault tolerance protocols play an important role in today long runtime scientific parallel applications because the probability of failure may be important due to the number of unreliable components involved during simulation. In this paper we present our approach and preliminary results about a new checkpoint/recovery protocol based on a coordinated scheme. This protocol is highly coupled to the availability of an abstract representation of the execution.
Type de document :
Communication dans un congrès
ICTTA'06 IEEE Conference on Information and Communication Technologies: from Theory to Applications, Apr 2006, Damascus, Syria. 2006, 〈10.1109/ICTTA.2006.1684955〉
Liste complète des métadonnées

https://hal.inria.fr/hal-00684864
Contributeur : Ist Rennes <>
Soumis le : mardi 3 avril 2012 - 13:24:36
Dernière modification le : jeudi 11 janvier 2018 - 06:22:02

Identifiants

Collections

Citation

Xavier Besseron, Samir Jafar, Thierry Gautier, Jean-Louis Roch. CCK: An Improved Coordinated Checkpoint/Rollback Protocol for Dataflow Applications in KAAPI. ICTTA'06 IEEE Conference on Information and Communication Technologies: from Theory to Applications, Apr 2006, Damascus, Syria. 2006, 〈10.1109/ICTTA.2006.1684955〉. 〈hal-00684864〉

Partager

Métriques

Consultations de la notice

138