Z. Bai and W. Wu, On greedy randomized kaczmarz method for solving large sparse linear systems, SIAM J. Sci. Comput, vol.40, issue.1, pp.592-606, 2018.

Z. Bai and W. Wu, On greedy randomized coordinate descent methods for solving large linear least-squares problems, Numer. Linear Algebr. Appl, vol.26, issue.4, pp.1-15, 2019.

A. Benoit, A. Cavelan, V. Le-fèvre, Y. Robert, and H. Sun, Towards optimal multi-level checkpointing, IEEE Trans. Computers, vol.66, issue.7, pp.1212-1226, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01339788

V. Bharadwaj, D. Ghose, and T. Robertazzi, Divisible load theory: a new paradigm for load scheduling in distributed systems, Cluster Computing, vol.6, issue.1, pp.7-17, 2003.

R. H. Bisseling and A. N. Yzelman, Thinking in sync: The bulksynchronous parallel approach to large-scale computing, ACM Computing reviews, vol.57, issue.6, pp.322-327, 2016.

D. Blackston and T. Suel, Highly portable and efficient implementations of parallel adaptive n-body methods, Proc. ACM Supercomputing, pp.1-20, 1997.

F. Cappello, A. Geist, B. Gropp, L. Kale, B. Kramer et al., Toward Exascale Resilience, Int. J. High Performance Computing Applications, vol.23, issue.4, pp.374-388, 2009.

F. Cappello, A. Geist, W. Gropp, S. Kale, B. Kramer et al., Toward exascale resilience: 2014 update, Supercomputing frontiers and innovations, vol.1, issue.1, 2014.

J. T. Daly, A higher order estimate of the optimum checkpoint interval for restart dumps, Future Generation Comp. Syst, vol.22, issue.3, pp.303-312, 2006.

S. Di, M. S. Bouguerra, L. Bautista-gomez, and F. Cappello, Optimization of multi-level checkpoint model for large scale HPC applications, IPDPS. IEEE, 2014.

A. V. Gerbessiotis and L. G. Valiant, Direct bulk-synchronous parallel algorithms, J. Parallel Distributed Computing, vol.22, issue.2, pp.251-267, 1994.

R. M. Gower and P. Richtárik, Randomized iterative methods for linear systems, SIAM Journal on Matrix Analysis and Applications, vol.36, issue.4, pp.1660-1690, 2015.

L. Han, L. Canon, H. Casanova, Y. Robert, and F. Vivien, Checkpointing workflows for fail-stop errors, IEEE Trans. Computers, vol.67, issue.8, pp.1105-1120, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01701611

L. Han, V. Le-fèvre, L. Canon, Y. Robert, and F. Vivien, A Generic Approach to Scheduling and Checkpointing Workflows, Int. Journal of High Performance Computing Applications, vol.33, issue.6, pp.1255-1274, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02140295

P. Hanrahan, D. Salzman, and L. Aupperle, A rapid hierarchical radiosity algorithm, Proceedings of the 18th annual conference on Computer graphics and interactive techniques, pp.197-206, 1991.

T. Herault and Y. Robert, Fault-Tolerance Techniques for High-Performance Computing, Computer Communications and Networks, 2015.

J. M. Hill, B. Mccoll, D. C. Stefanescu, M. W. Goudreau, K. Lang et al., BSPlib: The BSP programming library, Parallel Computing, vol.24, issue.14, pp.1947-1980, 1998.

D. and A. S. Lewis, Randomized methods for linear constraints: convergence rates and conditioning. Math. Operations Research, vol.35, pp.641-654, 2010.

A. Ma, D. Needell, and A. Ramdas, Convergence properties of the randomized extended gauss-seidel and kaczmarz methods, SIAM Journal on Matrix Analysis and Applications, vol.36, issue.4, pp.1590-1604, 2015.

A. Moody, G. Bronevetsky, K. Mohror, and B. R. Supinski, Design, modeling, and evaluation of a scalable multi-level checkpointing system, 2010.

, Code for ICPP double-blind review

D. Petcu, The performance of parallel iterative solvers, Computers and Mathematics with Applications, vol.50, issue.7, pp.1179-1189, 2005.

T. Robertazzi, Ten reasons to use divisible load theory, IEEE Computer, vol.36, issue.5, pp.63-68, 2003.

Y. Saad, Iterative Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematics, 2003.

T. Strohmer and R. Vershynin, A randomized kaczmarz algorithm with exponential convergence, J. Fourier Analysis and Applications, vol.15, issue.2, p.262, 2009.

. Top500, Top 500 Supercomputer Sites, 2018.

S. Toueg and O. Babao?lu, On the optimum checkpoint selection problem, SIAM J. Comput, vol.13, issue.3, 1984.

L. G. Valiant, The complexity of enumeration and reliability problems, SIAM J. Comput, vol.8, issue.3, pp.410-421, 1979.

J. W. Young, A first order approximation to the optimum checkpoint interval, Comm. of the ACM, vol.17, issue.9, pp.530-531, 1974.