Experiments on Checkpointing Adjoint MPI Programs

Abstract : Checkpointing is a classical strategy to reduce the peak memory consumption of the adjoint. Checkpointing is vital for long run-time codes, which is the case of most MPI parallel applications. However, for MPI codes this question has always been addressed by ad-hoc hand manipulations of the differentiated code, and with no formal assurance of correctness. In a previous work, we investigated the assumptions implicitly made during past experiments, to clarify and generalize them. On one hand we proposed an adaptation of checkpointing to the case of MPI parallel programs with point-to-point communications, so that the semantics of an adjoint program is preserved for any choice of the checkpointed part. On the other hand, we proposed an alternative adaptation of checkpointing, more efficient but that requires a number of restrictions on the choice of the checkpointed part. In this work we see checkpointing MPI parallel programs from a practical point of view. We propose an implementation of the adapted techniques inside the AMPI library. We discuss practical questions about the choice of technique to be applied within a checkpointed part and the choice of the checkpointed part itself. Finally, we validate our theoretical results on representative CFD codes.
Complete list of metadatas

Cited literature [3 references]  Display  Hide  Download

https://hal.inria.fr/hal-01413402
Contributor : Laurent Hascoet <>
Submitted on : Friday, December 9, 2016 - 5:07:51 PM
Last modification on : Thursday, January 11, 2018 - 4:48:46 PM
Long-term archiving on : Monday, March 27, 2017 - 3:05:42 PM

File

ExperimentMPIPaper.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01413402, version 1

Collections

Citation

Ala Taftaf, Laurent Hascoët. Experiments on Checkpointing Adjoint MPI Programs. 11th ASMO UK/ISSMO/NOED2016: International Conference on Numerical Optimisation Methods for Engineering Design, Jul 2016, Munich, Germany. ⟨hal-01413402⟩

Share

Metrics

Record views

108

Files downloads

46