Theft-Induced Checkpointing for Reconfigurable Dataflow Applications

Abstract : n this paper a new checkpoint/recovery protocol called theft-induced checkpointing is defined for dataflow computations in large heterogeneous environments. The protocol is especially useful in massively parallel multi-threaded computations as found in cluster or grid computing and utilizes the principle of work-stealing to distribute work. By basing the state of executions on a macro dataflow graph, the protocol shows extreme flexibility with respect to rollback. Specifically, it allows local rollback in dynamic heterogeneous systems, even under a different number of processors and processes. To maximize run-time efficiency, the overhead associated with checkpointing is shifted to the rollback operations whenever possible. Experimental results show the overhead induced is very small
Type de document :
Communication dans un congrès
IEEE Electro/Information Technology Conference (EIT 2005), May 2005, Lincoln, United States. 2005, 〈10.1109/EIT.2005.1626998〉
Liste complète des métadonnées

https://hal.inria.fr/hal-00683887
Contributeur : Ist Rennes <>
Soumis le : vendredi 30 mars 2012 - 10:17:38
Dernière modification le : mardi 24 avril 2018 - 13:37:25

Lien texte intégral

Identifiants

Collections

Citation

Samir Jafar, Axel W. Krings, Thierry Gautier, Jean-Louis Roch. Theft-Induced Checkpointing for Reconfigurable Dataflow Applications. IEEE Electro/Information Technology Conference (EIT 2005), May 2005, Lincoln, United States. 2005, 〈10.1109/EIT.2005.1626998〉. 〈hal-00683887〉

Partager

Métriques

Consultations de la notice

102