Failure Analysis and Modeling in Large Multi-Site Infrastructures

Minh Tran Ngoc 1 Guillaume Pierre 1
1 MYRIADS - Design and Implementation of Autonomous Distributed Systems
IRISA-D1 - SYSTÈMES LARGE ÉCHELLE, Inria Rennes – Bretagne Atlantique
Abstract : Every large multi-site infrastructure such as Grids and Clouds must implement fault-tolerance mechanisms and smart schedulers to enable continuous operation even when resource failures occur. Evaluating the efficiency of such mechanisms and schedulers requires representative failure models that are able to capture realistic properties of real world failure data. This paper shows that failures in multi-site infrastructures are far from being randomly distributed. We propose a failure model that captures features observed in real failure traces.
Type de document :
Communication dans un congrès
13th International IFIP Conference on Distributed Applications and Interoperable Systems, Jun 2013, Florence, Italy. 2013
Liste complète des métadonnées

Littérature citée [19 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-00804747
Contributeur : Guillaume Pierre <>
Soumis le : mardi 26 mars 2013 - 11:26:55
Dernière modification le : mercredi 11 avril 2018 - 02:01:21
Document(s) archivé(s) le : dimanche 2 avril 2017 - 20:28:23

Fichier

paper_45.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00804747, version 1

Citation

Minh Tran Ngoc, Guillaume Pierre. Failure Analysis and Modeling in Large Multi-Site Infrastructures. 13th International IFIP Conference on Distributed Applications and Interoperable Systems, Jun 2013, Florence, Italy. 2013. 〈hal-00804747〉

Partager

Métriques

Consultations de la notice

389

Téléchargements de fichiers

249