Reliability Versus Performance for Critical Applications

Alain Girault 1, * Erik Saule 2, 3 Denis Trystram 4, 3
* Corresponding author
1 POP ART - Programming languages, Operating Systems, Parallelism, and Aspects for Real-Time
Inria Grenoble - Rhône-Alpes, LIG - Laboratoire d'Informatique de Grenoble
3 MOAIS - PrograMming and scheduling design fOr Applications in Interactive Simulation
Inria Grenoble - Rhône-Alpes, LIG - Laboratoire d'Informatique de Grenoble
Abstract : Applications implemented on critical systems are subject to both safety critical and real-time constraints. Classically, applications are specified as precedence task graphs that must be scheduled onto a given target multiprocessor heterogeneous architecture. We propose a new method for simultaneously optimizing two objectives: the execution time and the reliability of the schedule. The problem is decomposed into two successive steps: a spatial allocation during which the reliability is maximized (randomized algorithm), and a scheduling during which the makespan is minimized (list scheduling algorithm). It allows us to produce several trade-off solutions, among which the user can choose the solution that best fits the application's requirements. Reliability is increased by replicating adequate tasks onto well chosen processors. Our fault model assumes that processors are fail-silent, that they are subject to transient failures, and that the occurrences of failures follow a constant parameter Poisson law. We assess and validate our method by running extensive simulations on both random graphs and actual application graphs. They show that it is competitive, in terms of makespan, compared to existing reference scheduling methods for heterogeneous processors (HEFT), while providing a better reliability.
Document type :
Journal articles
Complete list of metadatas

https://hal.inria.fr/hal-00753169
Contributor : Alain Girault <>
Submitted on : Saturday, November 17, 2012 - 10:49:28 PM
Last modification on : Thursday, April 4, 2019 - 10:18:04 AM

Identifiers

Collections

Citation

Alain Girault, Erik Saule, Denis Trystram. Reliability Versus Performance for Critical Applications. Journal of Parallel and Distributed Computing, Elsevier, 2009, 69 (3), pp.326--336. ⟨10.1016/j.jpdc.2008.11.002⟩. ⟨hal-00753169⟩

Share

Metrics

Record views

418