Skip to Main content Skip to Navigation
Conference papers

Failure Analysis and Modeling in Large Multi-site Infrastructures

Abstract : Every large multi-site infrastructure such as Grids and Clouds must implement fault-tolerance mechanisms and smart schedulers to enable continuous operation even when resource failures occur. Evaluating the efficiency of such mechanisms and schedulers requires representative failure models that are able to capture realistic properties of real world failure data. This paper shows that failures in multi-site infrastructures are far from being randomly distributed. We propose a failure model that captures features observed in real failure traces.
Complete list of metadata

Cited literature [19 references]  Display  Hide  Download
Contributor : Hal Ifip Connect in order to contact the contributor
Submitted on : Tuesday, March 14, 2017 - 2:19:15 PM
Last modification on : Tuesday, October 19, 2021 - 11:58:54 PM
Long-term archiving on: : Thursday, June 15, 2017 - 2:22:46 PM


Files produced by the author(s)


Distributed under a Creative Commons Attribution 4.0 International License



Tran Ngoc Minh, Guillaume Pierre. Failure Analysis and Modeling in Large Multi-site Infrastructures. 13th International Conference on Distributed Applications and Interoperable Systems (DAIS), Jun 2013, Florence, Italy. pp.127-140, ⟨10.1007/978-3-642-38541-4_10⟩. ⟨hal-01489451⟩



Record views


Files downloads