Resilient application co-scheduling with processor redistribution

Abstract : —Recently, the benefits of co-scheduling several applications have been demonstrated in a fault-free context, both in terms of performance and energy savings. However, large-scale computer systems are confronted to frequent failures, and resilience techniques must be employed to ensure the completion of large applications. Indeed, failures may create severe imbalance between applications, and significantly degrade performance. In this paper, we propose to redistribute the resources assigned to each application upon the striking of failures, in order to minimize the expected completion time of a set of co-scheduled applications. First, we introduce a formal model and establish complexity results. When no redistribution is allowed, we can minimize the expected completion time in polynomial time, while the problem becomes NP-complete with redistributions, even in a fault-free context. Therefore, we design polynomial-time heuristics that perform redistributions and account for processor failures. A fault simulator is used to perform extensive simulations that demonstrate the usefulness of redistribution and the performance of the proposed heuristics.
Type de document :
Communication dans un congrès
International Conference on Parallel Processing (ICPP), Aug 2016, Philadelphia, United States. The 45th International Conference on Parallel Processing. 〈http://icpp2016.cs.wcupa.edu〉
Liste complète des métadonnées

Littérature citée [20 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01354863
Contributeur : Equipe Roma <>
Soumis le : vendredi 19 août 2016 - 17:44:15
Dernière modification le : vendredi 20 avril 2018 - 15:44:27
Document(s) archivé(s) le : dimanche 20 novembre 2016 - 11:03:08

Fichier

PID4280453.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01354863, version 1

Collections

Citation

Anne Benoit, Loïc Pottier, Yves Robert. Resilient application co-scheduling with processor redistribution. International Conference on Parallel Processing (ICPP), Aug 2016, Philadelphia, United States. The 45th International Conference on Parallel Processing. 〈http://icpp2016.cs.wcupa.edu〉. 〈hal-01354863〉

Partager

Métriques

Consultations de la notice

349

Téléchargements de fichiers

49