Skip to Main content Skip to Navigation
New interface
Reports (Research report)

Towards resilient parallel linear Krylov solvers: recover-restart strategies

Abstract : The advent of extreme scale machines will require the use of parallel resources at an unprecedented scale, probably leading to a high rate of hardware faults. High Performance Computing (HPC) applications that aim at exploiting all these resources will thus need to be resilient, \emph{i.e.}, be able to compute a correct solution in presence of faults. In this work, we investigate possible remedies in the framework of the solution of large sparse linear systems that is often the inner most numerical kernel in many scientific and engineering applications and also one of the most time consuming part. More precisely, we present recovery followed by restarting strategies in the framework of Krylov subspace solvers where lost entries of the iterate are interpolated to define a new initial guess before restarting. In particular, we consider two interpolation policies that preserve key numerical properties of well-known solvers, namely the monotony decrease of the A-norm of the error of the conjugate gradient (CG) or the residual norm decrease of GMRES. We assess the impact of the recovery method, the fault rate and the number of processors on the robustness of the resulting linear solvers. We consider experiments with CG, GMRES and Bi-CGStab.
Complete list of metadata

Cited literature [26 references]  Display  Hide  Download
Contributor : Luc Giraud Connect in order to contact the contributor
Submitted on : Friday, July 12, 2013 - 3:26:27 PM
Last modification on : Thursday, October 27, 2022 - 4:02:34 AM
Long-term archiving on: : Wednesday, April 5, 2017 - 10:39:05 AM


Files produced by the author(s)


  • HAL Id : hal-00843992, version 1


Emmanuel Agullo, Luc Giraud, Abdou Guermouche, Jean Roman, Mawussi Zounon. Towards resilient parallel linear Krylov solvers: recover-restart strategies. [Research Report] RR-8324, INRIA. 2013, pp.36. ⟨hal-00843992⟩



Record views


Files downloads