Failure Analysis and Modeling in Large Multi-site Infrastructures

Abstract : Every large multi-site infrastructure such as Grids and Clouds must implement fault-tolerance mechanisms and smart schedulers to enable continuous operation even when resource failures occur. Evaluating the efficiency of such mechanisms and schedulers requires representative failure models that are able to capture realistic properties of real world failure data. This paper shows that failures in multi-site infrastructures are far from being randomly distributed. We propose a failure model that captures features observed in real failure traces.
Type de document :
Communication dans un congrès
Jim Dowling; François Taïani. 13th International Conference on Distributed Applications and Interoperable Systems (DAIS), Jun 2013, Florence, Italy. Springer, Lecture Notes in Computer Science, LNCS-7891, pp.127-140, 2013, Distributed Applications and Interoperable Systems. 〈10.1007/978-3-642-38541-4_10〉
Liste complète des métadonnées

Littérature citée [19 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01489451
Contributeur : Hal Ifip <>
Soumis le : mardi 14 mars 2017 - 14:19:15
Dernière modification le : mardi 16 janvier 2018 - 15:54:19
Document(s) archivé(s) le : jeudi 15 juin 2017 - 14:22:46

Fichier

978-3-642-38541-4_10_Chapter.p...
Fichiers produits par l'(les) auteur(s)

Licence


Distributed under a Creative Commons Paternité 4.0 International License

Identifiants

Citation

Tran Minh, Guillaume Pierre. Failure Analysis and Modeling in Large Multi-site Infrastructures. Jim Dowling; François Taïani. 13th International Conference on Distributed Applications and Interoperable Systems (DAIS), Jun 2013, Florence, Italy. Springer, Lecture Notes in Computer Science, LNCS-7891, pp.127-140, 2013, Distributed Applications and Interoperable Systems. 〈10.1007/978-3-642-38541-4_10〉. 〈hal-01489451〉

Partager

Métriques

Consultations de la notice

241

Téléchargements de fichiers

13