Reliability Versus Performance for Critical Applications

Alain Girault 1, * Erik Saule 2, 3 Denis Trystram 4, 3
* Auteur correspondant
1 POP ART - Programming languages, Operating Systems, Parallelism, and Aspects for Real-Time
Inria Grenoble - Rhône-Alpes, LIG - Laboratoire d'Informatique de Grenoble
3 MOAIS - PrograMming and scheduling design fOr Applications in Interactive Simulation
Inria Grenoble - Rhône-Alpes, LIG - Laboratoire d'Informatique de Grenoble
Abstract : Applications implemented on critical systems are subject to both safety critical and real-time constraints. Classically, applications are specified as precedence task graphs that must be scheduled onto a given target multiprocessor heterogeneous architecture. We propose a new method for simultaneously optimizing two objectives: the execution time and the reliability of the schedule. The problem is decomposed into two successive steps: a spatial allocation during which the reliability is maximized (randomized algorithm), and a scheduling during which the makespan is minimized (list scheduling algorithm). It allows us to produce several trade-off solutions, among which the user can choose the solution that best fits the application's requirements. Reliability is increased by replicating adequate tasks onto well chosen processors. Our fault model assumes that processors are fail-silent, that they are subject to transient failures, and that the occurrences of failures follow a constant parameter Poisson law. We assess and validate our method by running extensive simulations on both random graphs and actual application graphs. They show that it is competitive, in terms of makespan, compared to existing reference scheduling methods for heterogeneous processors (HEFT), while providing a better reliability.
Type de document :
Article dans une revue
Journal of Parallel and Distributed Computing, Elsevier, 2009, 69 (3), pp.326--336. 〈10.1016/j.jpdc.2008.11.002〉
Liste complète des métadonnées

https://hal.inria.fr/hal-00753169
Contributeur : Alain Girault <>
Soumis le : samedi 17 novembre 2012 - 22:49:28
Dernière modification le : jeudi 11 janvier 2018 - 06:22:03

Identifiants

Collections

Citation

Alain Girault, Erik Saule, Denis Trystram. Reliability Versus Performance for Critical Applications. Journal of Parallel and Distributed Computing, Elsevier, 2009, 69 (3), pp.326--336. 〈10.1016/j.jpdc.2008.11.002〉. 〈hal-00753169〉

Partager

Métriques

Consultations de la notice

324