A Fault-Tolerant Approach to Distributed Applications

Toan Nguyen 1 Jean-Antoine Desideri 1 Laurentiu Trifan 1
1 OPALE - Optimization and control, numerical algorithms and integration of complex multidiscipline systems governed by PDE
CRISAM - Inria Sophia Antipolis - Méditerranée , JAD - Laboratoire Jean Alexandre Dieudonné : UMR6621
Abstract : Distributed computing infrastructures support system and network fault-tolerance, e.g., grids and clouds. They transparently repair and prevent communication and system software errors. They also allow duplication and migration of jobs and data to prevent hardware failures. However, only limited work has been done so far on application resilience, i.e., the ability to resume normal execution after errors and abnormal executions in distributed environments. This paper addresses issues in application resilience, i.e., fault-tolerance to algorithmic errors and to resource allocation failures. It addresses solutions for error detection and management. It also overviews a platform used to deploy, execute, monitor, restart and resume distributed applications on grids and cloud infrastructures in case of unexpected behavior.
Type de document :
Communication dans un congrès
Parallel and Distributed Processing Techniques and Applications (PDPTA'13), Jul 2013, Las Vegas, United States. 2013
Liste complète des métadonnées

Littérature citée [29 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-00823329
Contributeur : Toan Nguyen <>
Soumis le : jeudi 13 juin 2013 - 14:21:25
Dernière modification le : jeudi 11 janvier 2018 - 15:57:41
Document(s) archivé(s) le : samedi 14 septembre 2013 - 04:13:49

Fichier

PDPTA2013.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00823329, version 3

Collections

Citation

Toan Nguyen, Jean-Antoine Desideri, Laurentiu Trifan. A Fault-Tolerant Approach to Distributed Applications. Parallel and Distributed Processing Techniques and Applications (PDPTA'13), Jul 2013, Las Vegas, United States. 2013. 〈hal-00823329v3〉

Partager

Métriques

Consultations de la notice

268

Téléchargements de fichiers

273