Benchmarking and Comparison of the Task Graph Scheduling Algorithms, Journal of Parallel and Distributed Computing, vol.59, issue.3, pp.381-422, 1999. ,
DOI : 10.1006/jpdc.1999.1578
Scheduling of scientific workflows in the ASKALON grid environment, ACM SIGMOD Record, vol.34, issue.3, pp.56-62, 2005. ,
DOI : 10.1145/1084805.1084816
Workflows and e-Science: An overview of workflow system features and capabilities, Future Generation Computer Systems, vol.25, issue.5, pp.528-540, 2009. ,
DOI : 10.1016/j.future.2008.06.012
Scheduling: Theory, Algorithms, and Systems, 2016. ,
Performance-effective and low-complexity task scheduling for heterogeneous computing, IEEE Transactions on Parallel and Distributed Systems, vol.13, issue.3, pp.260-274, 2002. ,
DOI : 10.1109/71.993206
Toward Exascale Resilience, International Journal of High Performance Computing Applications, vol.23, issue.4, 2014. ,
DOI : 10.1177/1094342009347767
The impact of new technology on soft error rates, 2011 International Reliability Physics Symposium, pp.5-9, 2011. ,
DOI : 10.1109/IRPS.2011.5784522
The effects of energy management on reliability in real-time embedded systems, Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp.35-40, 2004. ,
Detection and correction of silent data corruption for large-scale high-performance computing, Proc. of the ACM/IEEE SC Int. Conf., ser. SC '12, 2012. ,
Silent error detection in numerical time-stepping schemes, International Journal of High Performance Computing Applications, vol.29, issue.4, 1312. ,
DOI : 10.1177/1094342014532297
Online-ABFT: An Online Algorithm Based Fault Tolerance Scheme for Soft Error Detection in Iterative Methods, Proc. 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ser. PPoPP '13, pp.167-176, 2013. ,
Self-stabilizing iterative solvers, Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, ScalA '13, 2013. ,
DOI : 10.1145/2530268.2530272
Algorithm-based fault tolerance applied to high performance computing, Journal of Parallel and Distributed Computing, vol.69, issue.4, pp.410-416, 2009. ,
DOI : 10.1016/j.jpdc.2008.12.002
Computational complexity of PERT problems, Networks, vol.8, issue.2, pp.139-147, 1988. ,
DOI : 10.1002/net.3230180206
The Complexity of Enumeration and Reliability Problems, SIAM Journal on Computing, vol.8, issue.3, pp.410-421, 1979. ,
DOI : 10.1137/0208032
The Complexity of Counting Cuts and of Computing the Probability that a Graph is Connected, SIAM Journal on Computing, vol.12, issue.4, pp.777-788, 1983. ,
DOI : 10.1137/0212053
A note on the complexity of network reliability problems, IEEE Trans. Inf. Theory, vol.47, pp.1971-1988, 2004. ,
Computational Discrete Mathematics: Advanced Lectures, ch. Scheduling under Uncertainty: Bounding the Makespan Distribution, pp.79-97, 2001. ,
Probability and Computing: Randomized Algorithms and Probabilistic Analysis, 2005. ,
DOI : 10.1017/CBO9780511813603
Letter to the Editor???Monte Carlo Methods and the PERT Problem, Operations Research, vol.11, issue.5, pp.839-860, 1963. ,
DOI : 10.1287/opre.11.5.839
Correlation-Aware Heuristics for Evaluating the Distribution of the Longest Path Length of a DAG with Random Weights, IEEE Transactions on Parallel and Distributed Systems, vol.27, issue.11, 2016. ,
DOI : 10.1109/TPDS.2016.2528983
URL : https://hal.archives-ouvertes.fr/hal-01412922
The recognition of series parallel digraphs, Proc. 11th ACM Symp. on Theory of Computing, ser. STOC '79, pp.1-12, 1979. ,
Bounding the Project Completion Time Distribution in PERT Networks, Operations Research, vol.33, issue.4, pp.862-881, 1985. ,
DOI : 10.1287/opre.33.4.862
The Completion Time of PERT Networks, Journal of the Operational Research Society, vol.34, issue.2, pp.155-158, 1983. ,
DOI : 10.1057/jors.1983.27
The Greatest of a Finite Set of Random Variables, Operations Research, vol.9, issue.2, pp.145-162, 1961. ,
DOI : 10.1287/opre.9.2.145
ACR, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '13, 2013. ,
DOI : 10.1145/2503210.2503266
The Use of Triple-Modular Redundancy to Improve Computer Reliability, IBM Journal of Research and Development, vol.6, issue.2, pp.200-209, 1962. ,
DOI : 10.1147/rd.62.0200
Combining Partial Redundancy and Checkpointing for HPC, 2012 IEEE 32nd International Conference on Distributed Computing Systems, 2012. ,
DOI : 10.1109/ICDCS.2012.56
Algorithm-based fault tolerance for matrix operations, IEEE Trans. Comput, vol.33, issue.6, pp.518-528, 1984. ,
Fault tolerant preconditioned conjugate gradient for sparse linear system solution, Proceedings of the 26th ACM international conference on Supercomputing, ICS '12, 2012. ,
DOI : 10.1145/2304576.2304588
Fault-tolerant iterative methods via selective reliability, Sandia National Laboratories, 2011. ,
Soft error vulnerability of iterative linear algebra methods, Proceedings of the 22nd annual international conference on Supercomputing , ICS '08, pp.155-164, 2008. ,
DOI : 10.1145/1375527.1375552
Lightweight Silent Data Corruption Detection Based on Runtime Data Analysis for HPC Applications, Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, HPDC '15, 2015. ,
DOI : 10.1145/2749246.2749253
Detecting silent data corruption through data dynamic monitoring for scientific applications, SIGPLAN Notices, pp.381-382, 2014. ,
Reliability-aware Dynamic Voltage Scaling for energy-constrained real-time embedded systems, 2008 IEEE International Conference on Computer Design, pp.633-639, 2008. ,
DOI : 10.1109/ICCD.2008.4751927
Energy-aware scheduling under reliability and makespan constraints, 2012 19th International Conference on High Performance Computing, pp.1-10, 2012. ,
DOI : 10.1109/HiPC.2012.6507482
URL : https://hal.archives-ouvertes.fr/hal-00763384
Combined DVFS and mapping exploration for lifetime and soft-error susceptibility improvement in MPSoCs, Proceedings of the Conference on Design, pp.611-61, 2014. ,
Introduction to Algorithms, 2009. ,
Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines, Scientific Programming, pp.173-184, 1996. ,
DOI : 10.1155/1996/483083
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures, Concurrency and Computation: Practice and Experience, vol.23, issue.4, pp.187-198, 2011. ,
DOI : 10.1002/cpe.1631
URL : https://hal.archives-ouvertes.fr/inria-00384363