Skip to Main content Skip to Navigation
Conference papers

Optimal Checkpointing Period: Time vs. Energy

Abstract : This short paper deals with parallel scientific applications using non-blocking and periodic coordinated checkpointing to enforce resilience. We provide a model and detailed formulas for total execution time and consumed energy. We characterize the optimal period for both objectives, and we assess the range of time/energy trade-offs to be made by instantiating the model with a set of realistic scenarios for Exascale systems. We give a particular emphasis to I/O transfers, because the relative cost of communication is expected to dramatically increase, both in terms of latency and consumed energy, for future Exascale platforms.
Complete list of metadata

Cited literature [15 references]  Display  Hide  Download
Contributor : Equipe Roma Connect in order to contact the contributor
Submitted on : Thursday, January 9, 2014 - 11:14:04 AM
Last modification on : Friday, September 30, 2022 - 4:12:06 AM
Long-term archiving on: : Thursday, April 10, 2014 - 2:50:59 PM


Files produced by the author(s)


  • HAL Id : hal-00926199, version 1



Guillaume Aupy, Anne Benoit, Thomas Hérault, Yves Robert, Jack Dongarra. Optimal Checkpointing Period: Time vs. Energy. Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, Nov 2013, Denver, United States. ⟨hal-00926199⟩



Record views


Files downloads