Skip to Main content Skip to Navigation
Conference papers

Theft-Induced Checkpointing for Reconfigurable Dataflow Applications

Abstract : n this paper a new checkpoint/recovery protocol called theft-induced checkpointing is defined for dataflow computations in large heterogeneous environments. The protocol is especially useful in massively parallel multi-threaded computations as found in cluster or grid computing and utilizes the principle of work-stealing to distribute work. By basing the state of executions on a macro dataflow graph, the protocol shows extreme flexibility with respect to rollback. Specifically, it allows local rollback in dynamic heterogeneous systems, even under a different number of processors and processes. To maximize run-time efficiency, the overhead associated with checkpointing is shifted to the rollback operations whenever possible. Experimental results show the overhead induced is very small
Complete list of metadatas

https://hal.inria.fr/hal-00683887
Contributor : Ist Rennes <>
Submitted on : Friday, March 30, 2012 - 10:17:38 AM
Last modification on : Friday, November 6, 2020 - 4:39:26 AM

Identifiers

Collections

Citation

Samir Jafar, Axel W. Krings, Thierry Gautier, Jean-Louis Roch. Theft-Induced Checkpointing for Reconfigurable Dataflow Applications. IEEE Electro/Information Technology Conference (EIT 2005), May 2005, Lincoln, United States. ⟨10.1109/EIT.2005.1626998⟩. ⟨hal-00683887⟩

Share

Metrics

Record views

189