Skip to Main content Skip to Navigation
Reports

Optimal Checkpointing Period: Time vs. Energy

Abstract : This short paper deals with parallel scientific applications using non-blocking and periodic coordinated checkpointing to enforce resilience. We provide a model and detailed formulas for total execution time and consumed energy. We characterize the optimal period for both objectives, and we assess the range of time/energy trade-offs to be made by instantiating the model with a set of realistic scenarios for Exascale systems. We give a particular emphasis to I/O transfers, because the relative cost of communication is expected to dramatically increase, both in terms of latency and consumed energy, for future Exascale platforms.
Complete list of metadatas

Cited literature [16 references]  Display  Hide  Download

https://hal.inria.fr/hal-00878938
Contributor : Equipe Roma <>
Submitted on : Thursday, October 31, 2013 - 11:24:42 AM
Last modification on : Wednesday, February 26, 2020 - 11:14:02 AM
Long-term archiving on: : Saturday, February 1, 2014 - 4:26:08 AM

File

RR-8387.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00878938, version 1

Collections

Citation

Guillaume Aupy, Anne Benoit, Thomas Hérault, Yves Robert, Jack Dongarra. Optimal Checkpointing Period: Time vs. Energy. [Research Report] RR-8387, INRIA. 2013, pp.19. ⟨hal-00878938⟩

Share

Metrics

Record views

515

Files downloads

266