Optimal Checkpointing Period: Time vs. Energy

Abstract : This short paper deals with parallel scientific applications using non-blocking and periodic coordinated checkpointing to enforce resilience. We provide a model and detailed formulas for total execution time and consumed energy. We characterize the optimal period for both objectives, and we assess the range of time/energy trade-offs to be made by instantiating the model with a set of realistic scenarios for Exascale systems. We give a particular emphasis to I/O transfers, because the relative cost of communication is expected to dramatically increase, both in terms of latency and consumed energy, for future Exascale platforms.
Type de document :
Communication dans un congrès
Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, Nov 2013, Denver, United States. 2013
Liste complète des métadonnées

Littérature citée [15 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-00926199
Contributeur : Equipe Roma <>
Soumis le : jeudi 9 janvier 2014 - 11:14:04
Dernière modification le : vendredi 20 avril 2018 - 15:44:26
Document(s) archivé(s) le : jeudi 10 avril 2014 - 14:50:59

Fichier

main.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00926199, version 1

Collections

Citation

Guillaume Aupy, Anne Benoit, Thomas Hérault, Yves Robert, Jack Dongarra. Optimal Checkpointing Period: Time vs. Energy. Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, Nov 2013, Denver, United States. 2013. 〈hal-00926199〉

Partager

Métriques

Consultations de la notice

407

Téléchargements de fichiers

182