Theft-Induced Checkpointing for Reconfigurable Dataflow Applications - Archive ouverte HAL Access content directly
Conference Papers Year : 2005

Theft-Induced Checkpointing for Reconfigurable Dataflow Applications

Samir Jafar
  • Function : Author
  • PersonId : 834308
Jean-Louis Roch

Abstract

n this paper a new checkpoint/recovery protocol called theft-induced checkpointing is defined for dataflow computations in large heterogeneous environments. The protocol is especially useful in massively parallel multi-threaded computations as found in cluster or grid computing and utilizes the principle of work-stealing to distribute work. By basing the state of executions on a macro dataflow graph, the protocol shows extreme flexibility with respect to rollback. Specifically, it allows local rollback in dynamic heterogeneous systems, even under a different number of processors and processes. To maximize run-time efficiency, the overhead associated with checkpointing is shifted to the rollback operations whenever possible. Experimental results show the overhead induced is very small
Not file

Dates and versions

hal-00683887 , version 1 (30-03-2012)

Identifiers

Cite

Samir Jafar, Axel W. Krings, Thierry Gautier, Jean-Louis Roch. Theft-Induced Checkpointing for Reconfigurable Dataflow Applications. IEEE Electro/Information Technology Conference (EIT 2005), May 2005, Lincoln, United States. ⟨10.1109/EIT.2005.1626998⟩. ⟨hal-00683887⟩
67 View
0 Download

Altmetric

Share

Gmail Facebook Twitter LinkedIn More