E. Deelman, K. Vahi, G. Juve, M. Rynge, S. Callaghan et al., Pegasus, a workflow management system for science automation, Future Generation Computer Systems, vol.46, issue.0, pp.17-35, 2015.
DOI : 10.1016/j.future.2014.10.008

URL : https://doi.org/10.1016/j.future.2014.10.008

T. Fahringer, R. Prodan, R. Duan, J. Hofer, F. Nadeem et al., ASKALON: A Development and Grid Computing Environment for Scientific Workflows, Workflows for e-Science, pp.450-471, 2007.
DOI : 10.1007/978-1-84628-757-2_27

M. Wilde, M. Hategan, J. M. Wozniak, B. Clifford, D. S. Katz et al., Swift: A language for distributed parallel scripting, Parallel Computing, vol.37, issue.9, pp.633-652, 2011.
DOI : 10.1016/j.parco.2011.05.005

URL : http://www.ci.uchicago.edu/~wilde/SwiftParallelScripting.Parco2011.pdf

K. Wolstencroft, R. Haines, D. Fellows, A. Williams, D. Withers et al., The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud, Nucleic Acids Research, vol.41, issue.W1, p.328, 2013.
DOI : 10.1093/nar/gkt328

I. Altintas, C. Berkley, E. Jaeger, M. Jones, B. Ludascher et al., Kepler: an extensible system for design and execution of scientific workflows, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004., pp.423-424, 2004.
DOI : 10.1109/SSDM.2004.1311241

M. Albrecht, P. Donnelly, P. Bui, and D. Thain, Makeflow, Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies, SWEET '12, p.1, 2012.
DOI : 10.1145/2443416.2443417

F. Zhang, C. Docan, M. Parashar, S. Klasky, N. Podhorszki et al., Enabling In-situ Execution of Coupled Scientific Workflow on Multi-core Platform, 2012 IEEE 26th International Parallel and Distributed Processing Symposium, pp.1352-1363, 2012.
DOI : 10.1109/IPDPS.2012.122

J. N. Hagstrom, Computational complexity of PERT problems, Networks, vol.8, issue.2, pp.139-147, 1988.
DOI : 10.1002/net.3230180206

M. L. Pinedo, Scheduling: Theory, Algorithms, and Systems, 2016.

L. G. Valiant, The Complexity of Enumeration and Reliability Problems, SIAM Journal on Computing, vol.8, issue.3, pp.410-421, 1979.
DOI : 10.1137/0208032

J. S. Provan and M. O. Ball, The Complexity of Counting Cuts and of Computing the Probability that a Graph is Connected, SIAM Journal on Computing, vol.12, issue.4, pp.777-788, 1983.
DOI : 10.1137/0212053

H. L. Bodlaender and T. Wolle, A note on the complexity of network reliability problems, IEEE Trans. Inf. Theory, vol.47, pp.1971-1988, 2004.

J. Valdes, R. E. Tarjan, and E. L. Lawler, The recognition of series parallel digraphs, Proc. 11th ACM Symp. Theory of Computing, ser. STOC '79, pp.1-12, 1979.

H. L. Bodlaender and B. De-fluiter, Parallel algorithms for series parallel graphs, pp.277-289, 1996.
DOI : 10.1007/3-540-61680-2_62

URL : http://archive.cs.uu.nl/pub/RUU/CS/techreps/CS-1996/1996-13.ps.gz

A. Pothen and C. Sun, A Mapping Algorithm for Parallel Sparse Cholesky Factorization, SIAM Journal on Scientific Computing, vol.14, issue.5, pp.1253-1257, 1993.
DOI : 10.1137/0914074

S. Toueg and O. Babao?-glu, On the Optimum Checkpoint Selection Problem, SIAM Journal on Computing, vol.13, issue.3, 1984.
DOI : 10.1137/0213039

URL : http://graal.ens-lyon.fr/%7Eabenoit/CR02/papers/chains1.pdf

. Pegasus, Pegasus workflow generator, 2014.

R. H. Möhring, Scheduling under Uncertainty: Bounding the Makespan Distribution, Computational Discrete Mathematics: Advanced Lectures, pp.79-97, 2001.
DOI : 10.1007/3-540-45506-X_7

L. C. Canon and E. Jeannot, Correlation-Aware Heuristics for Evaluating the Distribution of the Longest Path Length of a DAG with Random Weights, IEEE Transactions on Parallel and Distributed Systems, vol.27, issue.11, 2016.
DOI : 10.1109/TPDS.2016.2528983

URL : https://hal.archives-ouvertes.fr/hal-01412922

D. Sculli, The Completion Time of PERT Networks, Journal of the Operational Research Society, vol.34, issue.2, pp.155-158, 1983.
DOI : 10.1057/jors.1983.27

H. Casanova, J. Herrmann, and Y. Robert, Computing the Expected Makespan of Task Graphs in the Presence of Silent Errors, 2016 45th International Conference on Parallel Processing Workshops (ICPPW), 2016.
DOI : 10.1109/ICPPW.2016.34

URL : https://hal.archives-ouvertes.fr/hal-01354711

M. Mitzenmacher and E. Upfal, Probability and Computing: Randomized Algorithms and Probabilistic Analysis, 2005.
DOI : 10.1017/CBO9780511813603

R. M. Van-slyke, Letter to the Editor???Monte Carlo Methods and the PERT Problem, Operations Research, vol.11, issue.5, pp.839-860, 1963.
DOI : 10.1287/opre.11.5.839

R. F. Da-silva, W. Chen, G. Juve, K. Vahi, and E. Deelman, Community Resources for Enabling Research in Distributed Scientific Workflows, 2014 IEEE 10th International Conference on e-Science, pp.177-184, 2014.
DOI : 10.1109/eScience.2014.44

S. Bharathi, A. Chervenak, E. Deelman, G. Mehta, M. Su et al., Characterization of scientific workflows, 2008 Third Workshop on Workflows in Support of Large-Scale Science, pp.1-10, 2008.
DOI : 10.1109/WORKS.2008.4723958

G. Juve, A. Chervenak, E. Deelman, S. Bharathi, G. Mehta et al., Characterizing and profiling scientific workflows, Future Generation Computer Systems, vol.29, issue.3, pp.682-692, 2013.
DOI : 10.1016/j.future.2012.08.015

E. Deelman, G. Singh, M. Su, J. Blythe, Y. Gil et al., Pegasus: A Framework for Mapping Complex Scientific Workflows onto Distributed Systems, Scientific Programming, pp.219-237, 2005.
DOI : 10.1155/2005/128026

URL : https://doi.org/10.1155/2005/128026

L. Han, Checkpointing Workflows for Fail-Stop Errors, 2017 IEEE International Conference on Cluster Computing (CLUSTER), 2017.
DOI : 10.1109/CLUSTER.2017.14

URL : https://hal.archives-ouvertes.fr/hal-01559967

L. Han, L. Canon, H. Casanova, Y. Robert, and F. Vivien, Checkpointing Workflows for Fail-Stop Errors, 2017 IEEE International Conference on Cluster Computing (CLUSTER), 2017.
DOI : 10.1109/CLUSTER.2017.14

URL : https://hal.archives-ouvertes.fr/hal-01559967

C. Cao, T. Herault, G. Bosilca, and J. Dongarra, Design for a Soft Error Resilient Dynamic Task-Based Runtime, 2015 IEEE International Parallel and Distributed Processing Symposium, pp.765-774, 2015.
DOI : 10.1109/IPDPS.2015.81

H. Jin, X. Sun, Z. Zheng, Z. Lan, and B. Xie, Performance Under Failures of DAGbased Parallel Computing, CCGRID '09, 2009.
DOI : 10.1109/ccgrid.2009.55

URL : http://www.cs.iit.edu/~scs/psfiles/3622a236.pdf

E. Kail, P. , and M. Kozlovszky, A novel adaptive checkpointing method based on information obtained from workflow structure, Computer Science, vol.17, issue.3, 2016.
DOI : 10.7494/csci.2016.17.3.387

M. Chtepen, F. H. Claeys, B. Dhoedt, F. D. Turck, P. Demeester et al., Adaptive Task Checkpointing and Replication: Toward Efficient Fault-Tolerant Grids, IEEE Transactions on Parallel and Distributed Systems, vol.20, issue.2, pp.180-190, 2009.
DOI : 10.1109/TPDS.2008.93

M. C. Kurt, S. Krishnamoorthy, K. Agrawal, and G. Agrawal, Fault-Tolerant Dynamic Task Graph Scheduling, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis, pp.719-730, 2014.
DOI : 10.1109/SC.2014.64

J. A. Moreno, O. S. Unsal, J. Labarta, and A. , Nanocheckpoints: A task-based asynchronous dataflow framework for efficient and scalable checkpoint/restart, 23rd Euromicro PDP, pp.99-102, 2015.

]. O. Subasi, O. S. Unsal, J. Labarta, G. Yalcin, and A. , CRC-Based Memory Reliability for Task-Parallel HPC Applications, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp.1101-1112, 2016.
DOI : 10.1109/IPDPS.2016.70

K. Huang and J. A. Abraham, Algorithm-based fault tolerance for matrix operations, IEEE Trans. Comput, vol.33, issue.6, pp.518-528, 1984.

G. Bosilca, R. Delmas, J. Dongarra, and J. Langou, Algorithm-based fault tolerance applied to high performance computing, Journal of Parallel and Distributed Computing, vol.69, issue.4, pp.410-416, 2009.
DOI : 10.1016/j.jpdc.2008.12.002

M. Shantharam, S. Srinivasmurthy, and P. Raghavan, Fault tolerant preconditioned conjugate gradient for sparse linear system solution, Proceedings of the 26th ACM international conference on Supercomputing, ICS '12, 2012.
DOI : 10.1145/2304576.2304588

E. Berrocal, L. Bautista-gomez, S. Di, Z. Lan, and F. Cappello, Lightweight Silent Data Corruption Detection Based on Runtime Data Analysis for HPC Applications, Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, HPDC '15, 2015.
DOI : 10.1145/1810085.1810120

L. , B. Gomez, and F. Cappello, Detecting silent data corruption through data dynamic monitoring for scientific applications, SIGPLAN Notices, pp.381-382, 2014.

K. Hashimoto, T. Tsuchiya, and T. Kikuno, Fault-secure scheduling of arbitrary task graphs to multiprocessor systems, Proceeding International Conference on Dependable Systems and Networks. DSN 2000, pp.203-212, 2000.
DOI : 10.1109/ICDSN.2000.857536

A. Girault and H. Kalla, A Novel Bicriteria Scheduling Heuristics Providing a Guaranteed Global System Failure Rate, IEEE Transactions on Dependable and Secure Computing, vol.6, issue.4, pp.241-254, 2009.
DOI : 10.1109/TDSC.2008.50

URL : https://hal.archives-ouvertes.fr/hal-00746768

O. Subasi, G. Yalcin, F. Zyulkyarov, O. Unsal, and J. Labarta, Designing and Modelling Selective Replication for Fault-Tolerant HPC Applications, 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), 2017.
DOI : 10.1109/CCGRID.2017.40

A. Benoit, A. Cavelan, Y. Robert, and H. Sun, Assessing general-purpose algorithms to cope with fail-stop and silent errors, ACM Trans. Parallel Computing, vol.3, issue.2, 2016.
DOI : 10.1007/978-3-319-17248-4_11

URL : https://hal.archives-ouvertes.fr/hal-01066664

G. Aupy, A. Benoit, H. Casanova, and Y. Robert, Scheduling Computational Workflows on Failure-Prone Platforms, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, pp.2-26, 2016.
DOI : 10.1109/IPDPSW.2015.33

URL : https://hal.archives-ouvertes.fr/hal-01075100

P. Wang, K. Zhang, R. Chen, H. Chen, and H. Guan, Replication-Based Fault-Tolerance for Large-Scale Graph Processing, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pp.562-573, 2014.
DOI : 10.1109/DSN.2014.58

I. Assayad, A. Girault, and H. Kalla, A bi-criteria scheduling heuristic for distributed embedded systems under reliability and real-time constraints, International Conference on Dependable Systems and Networks, 2004, 2004.
DOI : 10.1109/DSN.2004.1311904

G. Jacques-silva, Z. Kalbarczyk, B. Gedik, H. Andrade, K. L. Wu et al., Modeling stream processing applications for dependability evaluation, 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN), 2011.
DOI : 10.1109/DSN.2011.5958256

J. Díaz, J. Petit, and M. Serna, A survey of graph layout problems, ACM Computing Surveys, vol.34, issue.3, pp.313-356, 2002.
DOI : 10.1145/568522.568523