A Decentralized and Fault Tolerant Convergence Detection Algorithm for Asynchronous Iterative Algorithms - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Article Dans Une Revue Journal of Supercomputing Année : 2010

A Decentralized and Fault Tolerant Convergence Detection Algorithm for Asynchronous Iterative Algorithms

Résumé

This article presents an algorithm that performs a decentralized detection of the global convergence of parallel asynchronous iterative applications. This algorithm is fault tolerant. It runs a decentralized saving procedure which enables this algorithm, after a node's crash, to replace the dead node by a new one which will continue the computing task from the last check point. Combined with the advantages of the asynchronous iteration model, this method allows us to compute very large scale problems using highly volatile parallel architectures like Peer-to-Peer and distributed clusters architectures. We also present the implementation of this algorithm in the JaceP2P platform which is dedicated to designing and executing parallel asynchronous iterative applications in volatile environments. Numerous experiments show the robustness and the efficiency of our algorithm.
Fichier principal
Vignette du fichier
ccl09_ij.pdf (698.28 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00644471 , version 1 (24-11-2011)

Identifiants

Citer

Jean-Claude Charr, Raphaël Couturier, David Laiymani. A Decentralized and Fault Tolerant Convergence Detection Algorithm for Asynchronous Iterative Algorithms. Journal of Supercomputing, 2010, 53 (2), pp.269-292. ⟨10.1007/s11227-009-0293-6⟩. ⟨hal-00644471⟩
143 Consultations
120 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More