Skip to Main content Skip to Navigation
New interface
Conference papers

A Checkpoint/Recovery Model for Heterogeneous Dataflow Computations Using Work-Stealing

Abstract : This paper presents a new checkpoint/recovery method for dataflow computations using work-stealing in heterogeneous environments as found in grid or cluster computing. Basing the state of the computation on a dynamic macro dataflow graph, it is shown that the mechanisms provide effective checkpointing for multithreaded applications in heterogeneous environments. Two methods, Systematic Event Logging and Theft-Induced Checkpointing, are presented that are efficient and extremely flexible under the system-state model, allowing for recovery on different platforms under different number of processors. A formal analysis of the overhead induced by both methods is presented, followed by an experimental evaluation in a large cluster. It is shown that both methods have very small overhead and that trade-offs between checkpointing and recovery cost can be controlled.
Document type :
Conference papers
Complete list of metadata
Contributor : Ist Rennes Connect in order to contact the contributor
Submitted on : Wednesday, April 4, 2012 - 4:59:22 PM
Last modification on : Friday, November 18, 2022 - 9:26:11 AM

Links full text



Samir Jafar, Thierry Gautier, Axel W. Krings, Jean-Louis Roch. A Checkpoint/Recovery Model for Heterogeneous Dataflow Computations Using Work-Stealing. Euro-Par 2005, Aug 2005, Lisbonne, Portugal. ⟨10.1007/11549468_74⟩. ⟨hal-00685314⟩



Record views