Y. Kwok and I. Ahmad, Benchmarking and comparison of the task graph scheduling algorithms, J. Parallel Distrib. Comput, vol.59, issue.3, pp.381-422, 1999.

M. Wieczorek, R. Prodan, and T. Fahringer, Scheduling of scientific workflows in the askalon grid environment, SIGMOD Rec, vol.34, issue.3, pp.56-62, 2005.

E. Deelman, D. Gannon, M. Shields, and I. Taylor, Workflows and escience: An overview of workflow system features and capabilities, Future Gener. Comput. Syst, vol.25, issue.5, pp.528-540, 2009.
DOI : 10.1016/j.future.2008.06.012

P. Brucker, Scheduling Algorithms, 2004.

M. L. Pinedo, Scheduling: Theory, Algorithms, and Systems, 2016.

H. Topcuoglu, S. Hariri, and M. Y. Wu, Performance-effective and lowcomplexity task scheduling for heterogeneous computing, IEEE Trans. Parallel Distributed Systems, vol.13, issue.3, pp.260-274, 2002.
DOI : 10.1109/71.993206

URL : http://meseec.ce.rit.edu/eecc722-fall2002/papers/hc/5/l0260.pdf

F. Cappello, A. Geist, W. Gropp, S. Kale, B. Kramer et al., Toward exascale resilience: 2014 update, Supercomputing frontiers and innovations, vol.1, 2014.
DOI : 10.1177/1094342009347767

A. Dixit and A. Wood, The impact of new technology on soft error rates, IEEE International on Reliability Physics Symposium (IRPS), 2011.

D. Zhu, R. Melhem, and D. Mosse, The effects of energy management on reliability in real-time embedded systems, Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, pp.35-40, 2004.

, Fault-Tolerance Techniques for HighPerformance Computing, ser. Computer Communications and Networks, 2015.

D. Fiala, F. Mueller, C. Engelmann, R. Riesen, K. Ferreira et al., Detection and correction of silent data corruption for largescale high-performance computing, Proc. of the ACM/IEEE SC Int. Conf., ser. SC '12, 2012.
DOI : 10.1109/ipdps.2011.379

URL : http://repository.lib.ncsu.edu/bitstream/1840.4/8260/1/TR-2012-5.pdf

A. R. Benson, S. Schmit, and R. Schreiber, Silent error detection in numerical time-stepping schemes, CoRR, 2013.
DOI : 10.1177/1094342014532297

URL : http://arxiv.org/pdf/1312.2674

Z. Chen, Online-ABFT: An Online Algorithm Based Fault Tolerance Scheme for Soft Error Detection in Iterative Methods, Proc. 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ser. PPoPP '13, pp.167-176, 2013.

P. Sao and R. Vuduc, Self-stabilizing iterative solvers, Proc. ScalA '13, 2013.
DOI : 10.1145/2530268.2530272

G. Bosilca, R. Delmas, J. Dongarra, and J. Langou, Algorithm-based fault tolerance applied to high performance computing, J. Parallel and Distributed Computing, vol.69, issue.4, pp.410-416, 2009.
DOI : 10.1016/j.jpdc.2008.12.002

J. N. Hagstrom, Computational complexity of pert problems, Networks, vol.18, issue.2, pp.139-147, 1988.
DOI : 10.1002/net.3230180206

L. G. Valiant, The complexity of enumeration and reliability problems, SIAM J. Comput, vol.8, issue.3, pp.410-421, 1979.
DOI : 10.1137/0208032

J. S. Provan and M. O. Ball, The complexity of counting cuts and of computing the probability that a graph is connected, SIAM J. Comp, vol.12, issue.4, pp.777-788, 1983.

H. L. Bodlaender and T. Wolle, A note on the complexity of network reliability problems, IEEE Trans. Inf. Theory, vol.47, pp.1971-1988, 2004.

R. H. Möhring, Scheduling under uncertainty: Bounding the makespan distribution, Computational Discrete Mathematics: Advanced Lectures, pp.79-97, 2001.

M. Mitzenmacher and E. Upfal, Probability and Computing: Randomized Algorithms and Probabilistic Analysis, 2005.

R. M. Van-slyke, Monte carlo methods and the pert problem, Operations Research, vol.11, issue.5, pp.839-860, 1963.

L. C. Canon and E. Jeannot, Correlation-aware heuristics for evaluating the distribution of the longest path length of a DAG with random weights, IEEE Trans. Parallel Distributed Systems, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01412922

J. Valdes, R. E. Tarjan, and E. L. Lawler, The recognition of series parallel digraphs, Proc. 11th ACM Symp. on Theory of Computing, ser. STOC '79, pp.1-12, 1979.
DOI : 10.1145/800135.804393

B. Dodin, Bounding the project completion time distribution in PERT networks, Operations Research, vol.33, issue.4, pp.862-881, 1985.
DOI : 10.1287/opre.33.4.862

D. Sculli, The completion time of PERT networks, The Journal of the Operational Research Society, vol.34, issue.2, pp.155-158, 1983.

C. E. Clark, The greatest of a finite set of random variables, Operations Research, vol.9, issue.2, pp.145-162, 1961.

X. Ni, E. Meneses, N. Jain, and L. V. Kalé, ACR: Automatic Checkpoint/Restart for Soft and Hard Error Protection, Proc. SC'13, 2013.

R. E. Lyons and W. Vanderkulk, The use of triple-modular redundancy to improve computer reliability, IBM J. Res. Dev, vol.6, issue.2, pp.200-209, 1962.
DOI : 10.1147/rd.62.0200

J. Elliott, K. Kharbas, D. Fiala, F. Mueller, K. Ferreira et al., Combining partial redundancy and checkpointing for HPC, Proc. ICDCS, 2012.
DOI : 10.1109/icdcs.2012.56

URL : http://moss.csc.ncsu.edu/%7Emueller/ftp/pub/mueller/papers/icdcs12.pdf

K. Huang and J. A. Abraham, Algorithm-based fault tolerance for matrix operations, IEEE Trans. Comput, vol.33, issue.6, pp.518-528, 1984.

M. Shantharam, S. Srinivasmurthy, and P. Raghavan, Fault tolerant preconditioned conjugate gradient for sparse linear system solution, Proc. ICS '12, 2012.

M. Heroux and M. Hoemmen, Fault-tolerant iterative methods via selective reliability, Sandia National Laboratories, 2011.

G. Bronevetsky and B. De-supinski, Soft error vulnerability of iterative linear algebra methods, Proc. 22nd Int. Conf. on Supercomputing, ser. ICS '08, pp.155-164, 2008.

E. Berrocal, L. Bautista-gomez, S. Di, Z. Lan, and F. Cappello, Lightweight silent data corruption detection based on runtime data analysis for HPC applications, Proc. HPDC, 2015.

L. Bautista-gomez and F. Cappello, Detecting silent data corruption through data dynamic monitoring for scientific applications, SIGPLAN Notices, vol.49, issue.8, pp.381-382, 2014.

, Detecting and Correcting Data Corruption in Stencil Applications through Multivariate Interpolation, Proc.1st Int. Workshop on Fault Tolerant Systems (FTS), 2015.

B. Zhao, H. Aydin, and D. Zhu, Reliability-aware dynamic voltage scaling for energy-constrained real-time embedded systems, Proceedings of the IEEE International Conference on Computer Design (ICCD), pp.633-639, 2008.

G. Aupy, A. Benoit, and Y. Robert, Energy-aware scheduling under reliability and makespan constraints, Proceedings of the International Conference on High Performance Computing (HiPC), pp.1-10, 2012.
URL : https://hal.archives-ouvertes.fr/inria-00630721

A. Das, A. Kumar, B. Veeravalli, C. Bolchini, and A. Miele, Combined DVFS and mapping exploration for lifetime and soft-error susceptibility improvement in MPSoCs, Proceedings of the Conference on Design, Automation & Test in Europe (DATE), pp.61-62, 2014.

T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms, 2009.

J. Choi, J. Dongarra, S. Ostrouchov, A. Petitet, D. Walker et al., The design and implementation of the ScaLAPACK LU, QR, and Cholesky factorization routines, Scientific Programming, vol.5, pp.173-184, 1996.

C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier, StarPU: A unified platform for task scheduling on heterogeneous multicore architectures, Concurr. Comput. : Pract. Exper, vol.23, issue.2, pp.187-198, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00384363

G. Karypis and V. Kumar, MeTiS: A Software Package for Partitioning Unstructured Graphs, Partitioning Meshes, and Computing Fill-Reducing Orderings of Sparse Matrices Version 4.0, U. of Minnesota, Dpt. of Comp. Sci. and Eng, 1998.

J. R. Gilbert, G. L. Miller, and S. Teng, Geometric Mesh Partitioning: Implementation and Experiments, SIAM Journal on Scientific Computing, vol.19, issue.6, pp.2091-2110, 1998.

F. Suter, Scheduling delta-critical tasks in mixed-parallel applications on a national grid, Int. Conf. Grid Computing, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00165868