Optimizing performance and reliability on heterogeneous parallel systems: Approximation algorithms and heuristics

Emmanuel Jeannot 1, 2 Erik Saule 3 Denis Trystram 4, *
* Auteur correspondant
1 RUNTIME - Efficient runtime systems for parallel architectures
Inria Bordeaux - Sud-Ouest, UB - Université de Bordeaux, CNRS - Centre National de la Recherche Scientifique : UMR5800
4 MOAIS - PrograMming and scheduling design fOr Applications in Interactive Simulation
Inria Grenoble - Rhône-Alpes, LIG - Laboratoire d'Informatique de Grenoble
Abstract : We study the problem of scheduling tasks (with and without precedence constraints) on a set of related processors which have a probability of failure governed by an exponential law. The goal is to design approximation algorithms or heuristics that optimize both makespan and reliability. First, we show that both objectives are contradictory and that the number of points of the Pareto-front can be exponential. This means that this problem cannot be approximated by a single schedule. Second, for independent unitary tasks, we provide an optimal scheduling algorithm where the objective is to maximize the reliability subject to makespan minimization. For the bi-objective optimization, we provide a (1+ ,1)-approximation algorithm of the Pareto-front. Next, for independent arbitrary tasks, we propose a 1; 2 ; -approximation algorithm (i.e. for any xed value of the makespan, the obtained solution is optimal on the reliability and no more than twice the given makespan) that has a much lower complexity than the other existing algorithms. This solution is used to derive a (2 + ; 1)-approximation of the Pareto-front of the problem. All these proposed solutions are discriminated by the value of the product ffailure rateg funitary instruction execution timeg of each processor, which appears to be a crucial parameter in the context of bi-objective optimization. Based on this observation, we provide a general method for converting scheduling heuristics on heterogeneous clusters into heuristics that take into account the reliability when there are precedence constraints. The average behaviour is studied by extensive simulations. Finally, we discuss the speci c case of scheduling a chain of tasks which leads to optimal results.
Type de document :
Article dans une revue
Journal of Parallel and Distributed Computing, Elsevier, 2012, 72 (2), pp.268-280
Liste complète des métadonnées

https://hal.inria.fr/hal-00788219
Contributeur : Emmanuel Jeannot <>
Soumis le : jeudi 14 février 2013 - 09:33:12
Dernière modification le : mercredi 11 avril 2018 - 01:53:36

Identifiants

  • HAL Id : hal-00788219, version 1

Citation

Emmanuel Jeannot, Erik Saule, Denis Trystram. Optimizing performance and reliability on heterogeneous parallel systems: Approximation algorithms and heuristics. Journal of Parallel and Distributed Computing, Elsevier, 2012, 72 (2), pp.268-280. 〈hal-00788219〉

Partager

Métriques

Consultations de la notice

306