Y. Kwok and I. Ahmad, Benchmarking and Comparison of the Task Graph Scheduling Algorithms, Journal of Parallel and Distributed Computing, vol.59, issue.3, pp.381-422, 1999.
DOI : 10.1006/jpdc.1999.1578

M. Wieczorek, R. Prodan, and T. Fahringer, Scheduling of scientific workflows in the ASKALON grid environment, ACM SIGMOD Record, vol.34, issue.3, pp.56-62, 2005.
DOI : 10.1145/1084805.1084816

E. Deelman, D. Gannon, M. Shields, and I. Taylor, Workflows and e-Science: An overview of workflow system features and capabilities, Future Generation Computer Systems, vol.25, issue.5, pp.528-540, 2009.
DOI : 10.1016/j.future.2008.06.012

M. L. Pinedo, Scheduling: Theory, Algorithms, and Systems, 2016.

H. Topcuoglu, S. Hariri, and M. Y. Wu, Performance-effective and low-complexity task scheduling for heterogeneous computing, IEEE Transactions on Parallel and Distributed Systems, vol.13, issue.3, pp.260-274, 2002.
DOI : 10.1109/71.993206

F. Cappello, A. Geist, W. Gropp, S. Kale, B. Kramer et al., Toward Exascale Resilience, International Journal of High Performance Computing Applications, vol.23, issue.4, 2014.
DOI : 10.1177/1094342009347767

A. Dixit and A. Wood, The impact of new technology on soft error rates, 2011 International Reliability Physics Symposium, pp.5-9, 2011.
DOI : 10.1109/IRPS.2011.5784522

D. Zhu, R. Melhem, and D. Mosse, The effects of energy management on reliability in real-time embedded systems, Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp.35-40, 2004.

D. Fiala, F. Mueller, C. Engelmann, R. Riesen, K. Ferreira et al., Detection and correction of silent data corruption for large-scale high-performance computing, Proc. of the ACM/IEEE SC Int. Conf., ser. SC '12, 2012.

A. R. Benson, S. Schmit, and R. Schreiber, Silent error detection in numerical time-stepping schemes, International Journal of High Performance Computing Applications, vol.29, issue.4, 1312.
DOI : 10.1177/1094342014532297

Z. Chen, Online-ABFT: An Online Algorithm Based Fault Tolerance Scheme for Soft Error Detection in Iterative Methods, Proc. 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ser. PPoPP '13, pp.167-176, 2013.

P. Sao and R. Vuduc, Self-stabilizing iterative solvers, Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, ScalA '13, 2013.
DOI : 10.1145/2530268.2530272

G. Bosilca, R. Delmas, J. Dongarra, and J. Langou, Algorithm-based fault tolerance applied to high performance computing, Journal of Parallel and Distributed Computing, vol.69, issue.4, pp.410-416, 2009.
DOI : 10.1016/j.jpdc.2008.12.002

J. N. Hagstrom, Computational complexity of PERT problems, Networks, vol.8, issue.2, pp.139-147, 1988.
DOI : 10.1002/net.3230180206

L. G. Valiant, The Complexity of Enumeration and Reliability Problems, SIAM Journal on Computing, vol.8, issue.3, pp.410-421, 1979.
DOI : 10.1137/0208032

J. S. Provan and M. O. Ball, The Complexity of Counting Cuts and of Computing the Probability that a Graph is Connected, SIAM Journal on Computing, vol.12, issue.4, pp.777-788, 1983.
DOI : 10.1137/0212053

H. L. Bodlaender and T. Wolle, A note on the complexity of network reliability problems, IEEE Trans. Inf. Theory, vol.47, pp.1971-1988, 2004.

R. H. Möhring, Computational Discrete Mathematics: Advanced Lectures, ch. Scheduling under Uncertainty: Bounding the Makespan Distribution, pp.79-97, 2001.

M. Mitzenmacher and E. Upfal, Probability and Computing: Randomized Algorithms and Probabilistic Analysis, 2005.
DOI : 10.1017/CBO9780511813603

R. M. Van-slyke, Letter to the Editor???Monte Carlo Methods and the PERT Problem, Operations Research, vol.11, issue.5, pp.839-860, 1963.
DOI : 10.1287/opre.11.5.839

L. C. Canon and E. Jeannot, Correlation-Aware Heuristics for Evaluating the Distribution of the Longest Path Length of a DAG with Random Weights, IEEE Transactions on Parallel and Distributed Systems, vol.27, issue.11, 2016.
DOI : 10.1109/TPDS.2016.2528983

URL : https://hal.archives-ouvertes.fr/hal-01412922

J. Valdes, R. E. Tarjan, and E. L. Lawler, The recognition of series parallel digraphs, Proc. 11th ACM Symp. on Theory of Computing, ser. STOC '79, pp.1-12, 1979.

B. Dodin, Bounding the Project Completion Time Distribution in PERT Networks, Operations Research, vol.33, issue.4, pp.862-881, 1985.
DOI : 10.1287/opre.33.4.862

D. Sculli, The Completion Time of PERT Networks, Journal of the Operational Research Society, vol.34, issue.2, pp.155-158, 1983.
DOI : 10.1057/jors.1983.27

C. E. Clark, The Greatest of a Finite Set of Random Variables, Operations Research, vol.9, issue.2, pp.145-162, 1961.
DOI : 10.1287/opre.9.2.145

X. Ni, E. Meneses, N. Jain, and L. V. Kalé, ACR, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '13, 2013.
DOI : 10.1145/2503210.2503266

R. E. Lyons and W. Vanderkulk, The Use of Triple-Modular Redundancy to Improve Computer Reliability, IBM Journal of Research and Development, vol.6, issue.2, pp.200-209, 1962.
DOI : 10.1147/rd.62.0200

J. Elliott, K. Kharbas, D. Fiala, F. Mueller, K. Ferreira et al., Combining Partial Redundancy and Checkpointing for HPC, 2012 IEEE 32nd International Conference on Distributed Computing Systems, 2012.
DOI : 10.1109/ICDCS.2012.56

K. Huang and J. A. Abraham, Algorithm-based fault tolerance for matrix operations, IEEE Trans. Comput, vol.33, issue.6, pp.518-528, 1984.

M. Shantharam, S. Srinivasmurthy, and P. Raghavan, Fault tolerant preconditioned conjugate gradient for sparse linear system solution, Proceedings of the 26th ACM international conference on Supercomputing, ICS '12, 2012.
DOI : 10.1145/2304576.2304588

M. Heroux and M. Hoemmen, Fault-tolerant iterative methods via selective reliability, Sandia National Laboratories, 2011.

G. Bronevetsky and B. De-supinski, Soft error vulnerability of iterative linear algebra methods, Proceedings of the 22nd annual international conference on Supercomputing , ICS '08, pp.155-164, 2008.
DOI : 10.1145/1375527.1375552

E. Berrocal, L. Bautista-gomez, S. Di, Z. Lan, and F. Cappello, Lightweight Silent Data Corruption Detection Based on Runtime Data Analysis for HPC Applications, Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, HPDC '15, 2015.
DOI : 10.1145/2749246.2749253

L. , B. Gomez, and F. Cappello, Detecting silent data corruption through data dynamic monitoring for scientific applications, SIGPLAN Notices, pp.381-382, 2014.

B. Zhao, H. Aydin, and D. Zhu, Reliability-aware Dynamic Voltage Scaling for energy-constrained real-time embedded systems, 2008 IEEE International Conference on Computer Design, pp.633-639, 2008.
DOI : 10.1109/ICCD.2008.4751927

G. Aupy, A. Benoit, and Y. Robert, Energy-aware scheduling under reliability and makespan constraints, 2012 19th International Conference on High Performance Computing, pp.1-10, 2012.
DOI : 10.1109/HiPC.2012.6507482

URL : https://hal.archives-ouvertes.fr/hal-00763384

A. Das, A. Kumar, B. Veeravalli, C. Bolchini, and A. Miele, Combined DVFS and mapping exploration for lifetime and soft-error susceptibility improvement in MPSoCs, Proceedings of the Conference on Design, pp.611-61, 2014.

T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms, 2009.

J. Choi, J. Dongarra, S. Ostrouchov, A. Petitet, D. Walker et al., Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines, Scientific Programming, pp.173-184, 1996.
DOI : 10.1155/1996/483083

C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier, StarPU: a unified platform for task scheduling on heterogeneous multicore architectures, Concurrency and Computation: Practice and Experience, vol.23, issue.4, pp.187-198, 2011.
DOI : 10.1002/cpe.1631

URL : https://hal.archives-ouvertes.fr/inria-00384363