T. Nguyen, L. Trifan, and J. Désidéri, Resilient workflows for computational mechanics platforms, IOP Conference Series: Materials Science and Engineering, vol.10, 2010.
DOI : 10.1088/1757-899X/10/1/012015

URL : https://hal.archives-ouvertes.fr/inria-00524656

M. Caeiro-rodriguez, T. Priol, and Z. Németh, Dynamicity in Scientific Workflows, p.31, 2008.

J. Yu and R. Buyya, A Taxonomy of Workflow Management Systems for Grid Computing, Journal of Grid Computing, vol.15, issue.5???6, 2005.
DOI : 10.1007/s10723-005-9010-8

Z. Lan and Y. Li, Adaptive Fault Management of Parallel Applications for High- Performance Computing, IEEE TRANSACTIONS ON COMPUTERS, vol.57, issue.12, 2008.

G. Kandaswamy, A. Mandal, and D. A. Reed, Fault Tolerance and Recovery of Scientific Workflows on Computational Grids, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID), 2008.
DOI : 10.1109/CCGRID.2008.79

E. International and . Project, ROADMAP 1.0. http://www.exascale.org/me- diawiki

Z. Wei and L. , Checkpointing for Workflow Recovery

H. M. Arthur, . Ter-hofstede, M. P. Wil, M. Van-der-aalst, N. Adams et al., Modern Business Process Automation: YAWL and its Support Environment, 2010.

J. Gu, Z. Zheng, Z. Lan, and J. White, Eva Hocks, Byung-Hoon Park Dynamic Meta-Learning for Failure Prediction in Large-Scale Systems: A Case Study, INRIA Resiliency in Distributed Workflows 35

E. Deelman and Y. Gil, Managing Large-Scale Scientific Workflows in Distributed Environments: Experiences and Challenges, 2006 Second IEEE International Conference on e-Science and Grid Computing (e-Science'06), 2006.
DOI : 10.1109/E-SCIENCE.2006.261077

M. Adams, T. A. Hofstede, D. Edmond, W. M. Vander-aalst, S. Gudenkauf et al., Implementing dynamic flexibility in workflows using worklets Available: http://www.yawl-system.com/documents/Imple- menting%20Worklets Using UNICORE and WS- BPEL for Scientific Workflow Execution in Grid Environments, BPM Center Report BPM-06-06 Proceedings of 5th UNICORE Summit, 2006.

S. Shankar, J. David, and . Dewitt, Data driven workflow planning in cluster management systems, Proceedings of the 16th international symposium on High performance distributed computing , HPDC '07, 2007.
DOI : 10.1145/1272366.1272383

J. Yu, M. Kirley, and R. Buyya, Multi-objective planning for workflow execution on Grids, 2007 8th IEEE/ACM International Conference on Grid Computing, 2007.
DOI : 10.1109/GRID.2007.4354110

J. Wang, I. Altintas, C. Berkley, L. Gilbert, and M. B. Jones, A High-Level Distributed Execution Framework for Scientific Workflows, 2008 IEEE Fourth International Conference on eScience, 2008.
DOI : 10.1109/eScience.2008.166

E. Deelman, S. Callaghan, E. Field, H. Francoeur, R. Graves et al., Managing Large-Scale Workflow Execution from Resource Provisioning to Provenance Tracking: The CyberShake Example, 2006 Second IEEE International Conference on e-Science and Grid Computing (e-Science'06), 2006.
DOI : 10.1109/E-SCIENCE.2006.261098

M. Ghanem, N. Azam, M. Boniface, and J. Ferris, Grid-Enabled Workflows for Industrial Product Design, 2006 Second IEEE International Conference on e-Science and Grid Computing (e-Science'06), 2006.
DOI : 10.1109/E-SCIENCE.2006.261180

L. Ramakrishnan, C. Koelbel, Y. Kee, R. Wolski, D. Nurmi et al., VGrADS, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, 2009.
DOI : 10.1145/1654059.1654107

M. Vasko and S. Dustdar, A view based analysis of workflow modeling languages, 14th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP'06), 2006.
DOI : 10.1109/PDP.2006.17

J. B. Weissman, Fault tolerant computing on the grid: what are my options?, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469), 1999.
DOI : 10.1109/HPDC.1999.805323

X. Besseron and T. Gautier, Optimised Recovery with a Coordinated Checkpoint/Rollback Protocol for Domain Decomposition Applications, Communications in Computer and Information Science, 2008.
DOI : 10.1109/TSE.1987.232562

URL : https://hal.archives-ouvertes.fr/hal-00691997

Z. Wei and L. , Checkpointing for workflow recovery, ACM Southeast Regional Conference, 2000.

C. Buligon, S. Cechin, and I. Jansch-pôrto, Implementing Rollback-Recovery Coordinated Checkpoints Advanced Distributed Systems, 2005.

Z. Chen and J. J. Dongarra, Algorithm-based checkpoint-free fault tolerance for parallel matrix computations on volatile resources, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium, 2006.

A. By-adnan-agbaria, R. Freund, and . Friedman, Evaluating Distributed Checkpointing Protocol, 23 rd IEEE International Conference on Distributed Computing Systems (ICDCS'03), 2003.

K. Plankensteiner, R. Prodan, and T. Fahringer, A New Fault Tolerance Heuristic for Scientific Workflows in Highly Distributed Environments Based on Resubmission Impact, 2009 Fifth IEEE International Conference on e-Science, 2009.
DOI : 10.1109/e-Science.2009.51

X. Besseron, S. Jafar, T. Gautier, and J. Roch, CCK: An Improved Coordinated Checkpoint/Rollback Protocol for Dataflow Applications in Kaapi, 2006 2nd International Conference on Information & Communication Technologies, pp.3353-3358, 2006.
DOI : 10.1109/ICTTA.2006.1684955

URL : https://hal.archives-ouvertes.fr/hal-00684864

Y. Xiang, Z. Li, and H. Chen, Optimizing Adaptive Checkpointing Schemes for Grid Workflow Systems, 2006 Fifth International Conference on Grid and Cooperative Computing Workshops, 2006.
DOI : 10.1109/GCCW.2006.69

N. Elmootazbellah, J. S. Elnozahy, and . Plank, Checkpointing for Peta-Scale Systems: A Look into the Future of Practical Rollback-Recovery, IEEE Transactions on Dependable and Secure Computing, vol.1, issue.2, 2004.

N. Naksinehaboon, Y. Liu-leangsuksun, R. Nassar, M. Paun, and S. L. Scott, Reliability-Aware Approach: An Incremental Checkpoint/Restart Model in HPC Environments, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID), 2008.
DOI : 10.1109/CCGRID.2008.109

M. Gary, H. Weiss, and . Hirsh, Learning to Predict Rare Events in Event Sequences, Proceedings of the 4 th International Conference on Knowledge Discovery and Data Mining, 1998.

R. Gupta, P. Beckman, H. Park, E. Lusk, P. Hargrove et al., CIFTS: A Coordinated Infrastructure for Fault-Tolerant Systems, 2009 International Conference on Parallel Processing, 2009.
DOI : 10.1109/ICPP.2009.20

S. Tools and .. , 10 3.1 Yet Another Workflow Language (YAWL), p.11

I. Unité-de-recherche and . Futurs, Vignes 4, rue Jacques Monod -91893 ORSAY Cedex (France) Unité de recherche INRIA Lorraine : LORIA, Technopôle de Nancy-Brabois -Campus scientifique 615, rue du Jardin Botanique -BP 101 -54602 Villers-lès-Nancy Cedex (France) Unité de recherche INRIA Rennes : IRISA, Campus universitaire de Beaulieu -35042 Rennes Cedex (France) Unité de recherche INRIA Rhône-Alpes : 655, avenue de l'Europe -38334 Montbonnot Saint, Ismier (France) Unité de recherche INRIA Rocquencourt : Domaine de Voluceau -Rocquencourt -BP 105 -78153 Le Chesnay Cedex