Failure Analysis and Modeling in Large Multi-Site Infrastructures - Archive ouverte HAL Access content directly
Conference Papers Year : 2013

Failure Analysis and Modeling in Large Multi-Site Infrastructures

(1) , (1)
1

Abstract

Every large multi-site infrastructure such as Grids and Clouds must implement fault-tolerance mechanisms and smart schedulers to enable continuous operation even when resource failures occur. Evaluating the efficiency of such mechanisms and schedulers requires representative failure models that are able to capture realistic properties of real world failure data. This paper shows that failures in multi-site infrastructures are far from being randomly distributed. We propose a failure model that captures features observed in real failure traces.
Fichier principal
Vignette du fichier
paper_45.pdf (359.49 Ko) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

hal-00804747 , version 1 (26-03-2013)

Identifiers

  • HAL Id : hal-00804747 , version 1

Cite

Minh Tran Ngoc, Guillaume Pierre. Failure Analysis and Modeling in Large Multi-Site Infrastructures. 13th International IFIP Conference on Distributed Applications and Interoperable Systems, IFIP, Jun 2013, Florence, Italy. ⟨hal-00804747⟩
286 View
209 Download

Share

Gmail Facebook Twitter LinkedIn More