HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Reports

Resilient application co-scheduling with processor redistribution

Abstract : Recently, the benefits of co-scheduling several applications have been demonstrated in a fault-free context, both in terms of performance and energy savings. However, large-scale computer systems are confronted to frequent failures, and resilience techniques must be employed to ensure the completion of large applications. Indeed, failures may create severe imbalance between applications, and significantly degrade performance. In this paper, we propose to redistribute the resources assigned to each application upon the striking of failures, in order to minimize the expected completion time of a set of co-scheduled applications. First we introduce a formal model and establish complexity results. When no redistribution is allowed, we can minimize the expected completion time in polynomial time, while the problem becomes NP-complete with redistributions, even in a fault-free context. Therefore, we design polynomial-time heuristics that perform redistributions and account for processor failures. A fault simulator is used to perform extensive simulations that demonstrate the usefulness of redistribution and the performance of the proposed heuristics.
Complete list of metadata

Cited literature [20 references]  Display  Hide  Download

https://hal.inria.fr/hal-01219258
Contributor : Equipe Roma Connect in order to contact the contributor
Submitted on : Thursday, October 22, 2015 - 12:45:50 PM
Last modification on : Monday, May 16, 2022 - 4:46:02 PM
Long-term archiving on: : Friday, April 28, 2017 - 7:19:47 AM

File

RR-8795.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01219258, version 1

Collections

Citation

Anne Benoit, Loïc Pottier, Yves Robert. Resilient application co-scheduling with processor redistribution. [Research Report] RR-8795, Inria. 2015. ⟨hal-01219258⟩

Share

Metrics

Record views

193

Files downloads

137