Multi-criteria scheduling of precedence task graphs on heterogeneous platforms

Abstract : Latency, fault tolerance and reliability are important requirements for several applications that are time critical in nature: such applications require guarantees in terms of latency, even when processors are subject to failures. In this paper, we propose a fault-tolerant scheduling heuristic for mapping precedence task graphs on heterogeneous systems. Our approach is based on an active replication scheme, capable of supporting ε arbitrary fail-silent/fail-stop processor failures, and hence valid results will be provided even if ε processors fail. First we focus on a bi-criteria approach, where we aim at minimizing the latency given a fixed number of failures supported in the system, or the other way round. Next we derive a more complex algorithm in which we not only minimize latency and support a fixed number of failures, but also improve the overall reliability. Major achievements include low complexity of the new algorithms, and a drastic reduction of the number of additional communications induced by the replication mechanism. Experimental results demonstrate that our heuristics, despite their lower complexity, outperform their direct competitor, the fault-tolerance based active replication scheduling algorithm FTBAR.
Type de document :
Article dans une revue
The Computer Journal, Oxford University Press (UK), 2010, 53 (6), pp.772-785. 〈10.1093/comjnl/bxp067〉
Liste complète des métadonnées

https://hal.inria.fr/hal-00787907
Contributeur : Equipe Roma <>
Soumis le : mercredi 13 février 2013 - 11:54:36
Dernière modification le : vendredi 20 avril 2018 - 15:44:24

Identifiants

Collections

Citation

Anne Benoit, Mourad Hakem, Yves Robert. Multi-criteria scheduling of precedence task graphs on heterogeneous platforms. The Computer Journal, Oxford University Press (UK), 2010, 53 (6), pp.772-785. 〈10.1093/comjnl/bxp067〉. 〈hal-00787907〉

Partager

Métriques

Consultations de la notice

200