Skip to Main content Skip to Navigation
New interface
Conference papers

Experiments on Checkpointing Adjoint MPI Programs

Ala Taftaf 1 Laurent Hascoët 1 
Abstract : Checkpointing is a classical strategy to reduce the peak memory consumption of the adjoint. Checkpointing is vital for long run-time codes, which is the case of most MPI parallel applications. However, for MPI codes this question has always been addressed by ad-hoc hand manipulations of the differentiated code, and with no formal assurance of correctness. In a previous work, we investigated the assumptions implicitly made during past experiments, to clarify and generalize them. On one hand we proposed an adaptation of checkpointing to the case of MPI parallel programs with point-to-point communications, so that the semantics of an adjoint program is preserved for any choice of the checkpointed part. On the other hand, we proposed an alternative adaptation of checkpointing, more efficient but that requires a number of restrictions on the choice of the checkpointed part. In this work we see checkpointing MPI parallel programs from a practical point of view. We propose an implementation of the adapted techniques inside the AMPI library. We discuss practical questions about the choice of technique to be applied within a checkpointed part and the choice of the checkpointed part itself. Finally, we validate our theoretical results on representative CFD codes.
Complete list of metadata

Cited literature [3 references]  Display  Hide  Download
Contributor : Laurent Hascoet Connect in order to contact the contributor
Submitted on : Friday, December 9, 2016 - 5:07:51 PM
Last modification on : Saturday, June 25, 2022 - 11:24:38 PM
Long-term archiving on: : Monday, March 27, 2017 - 3:05:42 PM


Files produced by the author(s)


  • HAL Id : hal-01413402, version 1



Ala Taftaf, Laurent Hascoët. Experiments on Checkpointing Adjoint MPI Programs. 11th ASMO UK/ISSMO/NOED2016: International Conference on Numerical Optimisation Methods for Engineering Design, Jul 2016, Munich, Germany. ⟨hal-01413402⟩



Record views


Files downloads