Benchmarking and comparison of the task graph scheduling algorithms, J. Parallel Distrib. Comput, vol.59, issue.3, pp.381-422, 1999. ,
Scheduling of scientific workflows in the askalon grid environment, SIGMOD Rec, vol.34, issue.3, pp.56-62, 2005. ,
Workflows and escience: An overview of workflow system features and capabilities, Future Gener. Comput. Syst, vol.25, issue.5, pp.528-540, 2009. ,
DOI : 10.1016/j.future.2008.06.012
Scheduling Algorithms, 2004. ,
, Scheduling: Theory, Algorithms, and Systems, 2016.
Performance-effective and lowcomplexity task scheduling for heterogeneous computing, IEEE Trans. Parallel Distributed Systems, vol.13, issue.3, pp.260-274, 2002. ,
DOI : 10.1109/71.993206
URL : http://meseec.ce.rit.edu/eecc722-fall2002/papers/hc/5/l0260.pdf
Toward exascale resilience: 2014 update, Supercomputing frontiers and innovations, vol.1, 2014. ,
DOI : 10.1177/1094342009347767
The impact of new technology on soft error rates, IEEE International on Reliability Physics Symposium (IRPS), 2011. ,
The effects of energy management on reliability in real-time embedded systems, Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, pp.35-40, 2004. ,
, Fault-Tolerance Techniques for HighPerformance Computing, ser. Computer Communications and Networks, 2015.
Detection and correction of silent data corruption for largescale high-performance computing, Proc. of the ACM/IEEE SC Int. Conf., ser. SC '12, 2012. ,
DOI : 10.1109/ipdps.2011.379
URL : http://repository.lib.ncsu.edu/bitstream/1840.4/8260/1/TR-2012-5.pdf
Silent error detection in numerical time-stepping schemes, CoRR, 2013. ,
DOI : 10.1177/1094342014532297
URL : http://arxiv.org/pdf/1312.2674
Online-ABFT: An Online Algorithm Based Fault Tolerance Scheme for Soft Error Detection in Iterative Methods, Proc. 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ser. PPoPP '13, pp.167-176, 2013. ,
Self-stabilizing iterative solvers, Proc. ScalA '13, 2013. ,
DOI : 10.1145/2530268.2530272
Algorithm-based fault tolerance applied to high performance computing, J. Parallel and Distributed Computing, vol.69, issue.4, pp.410-416, 2009. ,
DOI : 10.1016/j.jpdc.2008.12.002
Computational complexity of pert problems, Networks, vol.18, issue.2, pp.139-147, 1988. ,
DOI : 10.1002/net.3230180206
The complexity of enumeration and reliability problems, SIAM J. Comput, vol.8, issue.3, pp.410-421, 1979. ,
DOI : 10.1137/0208032
The complexity of counting cuts and of computing the probability that a graph is connected, SIAM J. Comp, vol.12, issue.4, pp.777-788, 1983. ,
A note on the complexity of network reliability problems, IEEE Trans. Inf. Theory, vol.47, pp.1971-1988, 2004. ,
Scheduling under uncertainty: Bounding the makespan distribution, Computational Discrete Mathematics: Advanced Lectures, pp.79-97, 2001. ,
, Probability and Computing: Randomized Algorithms and Probabilistic Analysis, 2005.
Monte carlo methods and the pert problem, Operations Research, vol.11, issue.5, pp.839-860, 1963. ,
Correlation-aware heuristics for evaluating the distribution of the longest path length of a DAG with random weights, IEEE Trans. Parallel Distributed Systems, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01412922
The recognition of series parallel digraphs, Proc. 11th ACM Symp. on Theory of Computing, ser. STOC '79, pp.1-12, 1979. ,
DOI : 10.1145/800135.804393
Bounding the project completion time distribution in PERT networks, Operations Research, vol.33, issue.4, pp.862-881, 1985. ,
DOI : 10.1287/opre.33.4.862
The completion time of PERT networks, The Journal of the Operational Research Society, vol.34, issue.2, pp.155-158, 1983. ,
The greatest of a finite set of random variables, Operations Research, vol.9, issue.2, pp.145-162, 1961. ,
ACR: Automatic Checkpoint/Restart for Soft and Hard Error Protection, Proc. SC'13, 2013. ,
The use of triple-modular redundancy to improve computer reliability, IBM J. Res. Dev, vol.6, issue.2, pp.200-209, 1962. ,
DOI : 10.1147/rd.62.0200
Combining partial redundancy and checkpointing for HPC, Proc. ICDCS, 2012. ,
DOI : 10.1109/icdcs.2012.56
URL : http://moss.csc.ncsu.edu/%7Emueller/ftp/pub/mueller/papers/icdcs12.pdf
Algorithm-based fault tolerance for matrix operations, IEEE Trans. Comput, vol.33, issue.6, pp.518-528, 1984. ,
Fault tolerant preconditioned conjugate gradient for sparse linear system solution, Proc. ICS '12, 2012. ,
Fault-tolerant iterative methods via selective reliability, Sandia National Laboratories, 2011. ,
Soft error vulnerability of iterative linear algebra methods, Proc. 22nd Int. Conf. on Supercomputing, ser. ICS '08, pp.155-164, 2008. ,
Lightweight silent data corruption detection based on runtime data analysis for HPC applications, Proc. HPDC, 2015. ,
Detecting silent data corruption through data dynamic monitoring for scientific applications, SIGPLAN Notices, vol.49, issue.8, pp.381-382, 2014. ,
, Detecting and Correcting Data Corruption in Stencil Applications through Multivariate Interpolation, Proc.1st Int. Workshop on Fault Tolerant Systems (FTS), 2015.
Reliability-aware dynamic voltage scaling for energy-constrained real-time embedded systems, Proceedings of the IEEE International Conference on Computer Design (ICCD), pp.633-639, 2008. ,
Energy-aware scheduling under reliability and makespan constraints, Proceedings of the International Conference on High Performance Computing (HiPC), pp.1-10, 2012. ,
URL : https://hal.archives-ouvertes.fr/inria-00630721
Combined DVFS and mapping exploration for lifetime and soft-error susceptibility improvement in MPSoCs, Proceedings of the Conference on Design, Automation & Test in Europe (DATE), pp.61-62, 2014. ,
, Introduction to Algorithms, 2009.
The design and implementation of the ScaLAPACK LU, QR, and Cholesky factorization routines, Scientific Programming, vol.5, pp.173-184, 1996. ,
StarPU: A unified platform for task scheduling on heterogeneous multicore architectures, Concurr. Comput. : Pract. Exper, vol.23, issue.2, pp.187-198, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00384363
MeTiS: A Software Package for Partitioning Unstructured Graphs, Partitioning Meshes, and Computing Fill-Reducing Orderings of Sparse Matrices Version 4.0, U. of Minnesota, Dpt. of Comp. Sci. and Eng, 1998. ,
Geometric Mesh Partitioning: Implementation and Experiments, SIAM Journal on Scientific Computing, vol.19, issue.6, pp.2091-2110, 1998. ,
Scheduling delta-critical tasks in mixed-parallel applications on a national grid, Int. Conf. Grid Computing, 2007. ,
URL : https://hal.archives-ouvertes.fr/inria-00165868