Resilience for Collaborative Applications on Clouds

Toan Nguyen 1 Jean-Antoine Desideri 1
1 OPALE - Optimization and control, numerical algorithms and integration of complex multidiscipline systems governed by PDE
CRISAM - Inria Sophia Antipolis - Méditerranée , JAD - Laboratoire Jean Alexandre Dieudonné : UMR6621
Abstract : Because e-Science applications are data intensive and require long execution runs, it is important that they feature fault-tolerance mechanisms. Cloud and grid computing infrastructures often support system and network fault-tolerance. They repair and prevent communication and software errors. They allow also checkpointing of applications, duplication of jobs and data to prevent catastrophic hardware failures. However, only preliminary work has been done so far on application resilience, i.e., the ability to resume normal execution following application errors and abnormal executions. This paper is an overview of open issues and solutions for such errors detection and management. It also overviews the implementation of a workflow management system to design, deploy, execute, monitor, restart and resume distributed HPC applications on cloud infrastructures in cases of failures.
Type de document :
Communication dans un congrès
B. Murgante et al. ICCSA2012 - 12th International Conference on Computational Science and Its Applications, Jun 2012, Salvador de Bahia, Brazil. Springer, 7336, pp.418-433, 2012, Lecture Notes in Computer Science - LNCS
Liste complète des métadonnées

Littérature citée [20 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-00700571
Contributeur : Toan Nguyen <>
Soumis le : mercredi 23 mai 2012 - 14:36:03
Dernière modification le : jeudi 3 mai 2018 - 13:32:55
Document(s) archivé(s) le : vendredi 24 août 2012 - 02:34:10

Fichier

ICCSA2012.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00700571, version 1

Collections

Citation

Toan Nguyen, Jean-Antoine Desideri. Resilience for Collaborative Applications on Clouds. B. Murgante et al. ICCSA2012 - 12th International Conference on Computational Science and Its Applications, Jun 2012, Salvador de Bahia, Brazil. Springer, 7336, pp.418-433, 2012, Lecture Notes in Computer Science - LNCS. 〈hal-00700571〉

Partager

Métriques

Consultations de la notice

202

Téléchargements de fichiers

149