Resilience at extreme scale : system level, algorithmic level or both? - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2013

Resilience at extreme scale : system level, algorithmic level or both?

Résumé

Resilience is a critical problem for extreme scale numerical simulations. The most credible solution is still based on checkpoint/restart with its high overheads or hardware cost. It has been shown recently that some algorithmic approaches and some code characteristics can help reducing these costs through combined system-algorithmic/application approaches. However, we are still looking for a right solution to this simple question: how to reduce simultaneously and significantly state saving and recovery times?
Fichier non déposé

Dates et versions

hal-00799309 , version 1 (25-07-2013)

Identifiants

  • HAL Id : hal-00799309 , version 1

Citer

Luc Giraud, Franck Cappello. Resilience at extreme scale : system level, algorithmic level or both?. SIAM Conference on Computational Science and Engineering (SIAM CSE 2013), Feb 2013, Boston, United States. ⟨hal-00799309⟩
205 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More