Skip to Main content Skip to Navigation
Conference papers

Failure Analysis and Modeling in Large Multi-Site Infrastructures

Minh Tran Ngoc 1 Guillaume Pierre 1
1 MYRIADS - Design and Implementation of Autonomous Distributed Systems
Inria Rennes – Bretagne Atlantique , IRISA-D1 - SYSTÈMES LARGE ÉCHELLE
Abstract : Every large multi-site infrastructure such as Grids and Clouds must implement fault-tolerance mechanisms and smart schedulers to enable continuous operation even when resource failures occur. Evaluating the efficiency of such mechanisms and schedulers requires representative failure models that are able to capture realistic properties of real world failure data. This paper shows that failures in multi-site infrastructures are far from being randomly distributed. We propose a failure model that captures features observed in real failure traces.
Document type :
Conference papers
Complete list of metadata

Cited literature [19 references]  Display  Hide  Download
Contributor : Guillaume Pierre Connect in order to contact the contributor
Submitted on : Tuesday, March 26, 2013 - 11:26:55 AM
Last modification on : Tuesday, October 19, 2021 - 11:58:53 PM
Long-term archiving on: : Sunday, April 2, 2017 - 8:28:23 PM


Files produced by the author(s)


  • HAL Id : hal-00804747, version 1


Minh Tran Ngoc, Guillaume Pierre. Failure Analysis and Modeling in Large Multi-Site Infrastructures. 13th International IFIP Conference on Distributed Applications and Interoperable Systems, IFIP, Jun 2013, Florence, Italy. ⟨hal-00804747⟩



Les métriques sont temporairement indisponibles