Using Cliques of Nodes to Store Desktop Grid Checkpoints

Abstract : Checkpoints that store intermediate results of computation have a fundamental impact on the computing throughput of Desktop Grid systems, like BOINC. Currently, BOINC workers store their checkpoints locally. A major limitation of this approach is that whenever a worker leaves unfinished computation, no other worker can proceed from the last stable checkpoint. This forces tasks to be restarted from scratch when the original machine is no longer available. To overcome this limitation, we propose to share checkpoints between nodes. To organize this mechanism, we arrange nodes to form complete graphs (cliques), where nodes share all the checkpoints they compute. Cliques function as survivable units, where checkpoints and tasks are not lost as long as one of the nodes of the clique remains alive. To simplify construction and maintenance of the cliques, we take advantage of the central supervisor of BOINC. To evaluate our solution, we combine simulation with some real data to answer the most fundamental question: what do we need to pay for increased throughput?
Type de document :
Communication dans un congrès
Coregrid Integration Workshop, 2008, Crete, Greece. 2008
Liste complète des métadonnées

Littérature citée [14 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-00953613
Contributeur : Arnaud Legrand <>
Soumis le : lundi 10 mars 2014 - 16:50:40
Dernière modification le : mercredi 11 avril 2018 - 01:52:42
Document(s) archivé(s) le : mardi 10 juin 2014 - 10:37:09

Fichier

araujo_coregrid08.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00953613, version 1

Collections

Citation

Filipe Araujo, Patricio Domingues, Derrick Kondo, Luis Moura Silva. Using Cliques of Nodes to Store Desktop Grid Checkpoints. Coregrid Integration Workshop, 2008, Crete, Greece. 2008. 〈hal-00953613〉

Partager

Métriques

Consultations de la notice

468

Téléchargements de fichiers

204