R. Bolze, F. Cappello, E. Caron, M. Daydé, F. Desprez et al., Grid'5000: A Large Scale And Highly Reconfigurable Experimental Grid Testbed, International Journal of High Performance Computing Applications, vol.20, issue.4, pp.481-494, 2006.
DOI : 10.1177/1094342006070078

URL : https://hal.archives-ouvertes.fr/hal-00684943

A. Iosup, M. Jan, O. Sonmez, and D. Epema, On the dynamic resource availability in grids, 2007 8th IEEE/ACM International Conference on Grid Computing, pp.26-33, 2007.
DOI : 10.1109/GRID.2007.4354112

I. Foster, H. Kishimoto, A. Savva, D. Berry, A. Grimshaw et al., The Open Grid Services Architecture, Version 1.5, Tech. Rep. GFD-R.80, Open Grid Forum, 2006.

L. Rilling, Vigne: Towards a Self-healing Grid Operating System, Proceedings of Euro-Par 2006, pp.437-447, 2006.
DOI : 10.1007/11823285_45

]. R. Resnick, A Modern Taxonomy of High Availability, 1996.

R. Pennington, Terascale clusters and the TeraGrid, Proceedings of 6th International Conference/Exhibition on High Performance Computing in Asia Pacific Region, pp.407-413, 2002.

D. P. Anderson, J. Cobb, E. Korpela, M. Lebofsky, and D. Werthimer, SETI@home: an experiment in public-resource computing, Communications of the ACM, vol.45, issue.11, pp.56-61, 2002.
DOI : 10.1145/581571.581573

P. Eerola, B. Kónya, O. Smirnova, T. Ekelöf, M. Ellert et al., The Nordugrid production grid infrastructure, status and plans, Proceedings. First Latin American Web Congress, p.158, 2003.
DOI : 10.1109/GRID.2003.1261711

T. Cortes, C. Franke, Y. Jégou, T. Kielmann, D. Laforenza et al., XtreemOS: a Vision for a Grid Operating System, tech. rep., XtreemOS, 2008.

C. Morin, XtreemOS: A Grid Operating System Making your Computer Ready for Participating in Virtual Organizations, 10th IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing (ISORC'07), pp.393-402, 2007.
DOI : 10.1109/ISORC.2007.62

URL : https://hal.archives-ouvertes.fr/hal-01271216

I. Foster, Globus toolkit version 4: Software for service-oriented systems, IFIP International Conference on Network and Parallel Computing, pp.2-13, 2005.

E. Jeanvoine, C. Morin, and D. Leprince, Vigne: Executing Easily and Efficiently a Wide Range of Distributed Applications in Grids, Proceedings of Euro-Par, pp.394-403, 2007.
DOI : 10.1007/978-3-540-74466-5_43

URL : https://hal.archives-ouvertes.fr/hal-00689008

A. Rowstron and P. Druschel, Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems, Proceedings of International Middleware Conference, pp.329-350, 2001.
DOI : 10.1007/3-540-45518-3_18

S. Rhea, D. Geels, T. Roscoe, and J. Kubiatowicz, Handling churn in a DHT, Proceedings of the USENIX Annual Technical Conference, pp.127-140, 2004.

D. Pulsipher and L. Ovoca, Job Submission Description Language (JSDL) Specification , Version 1.0

S. Mena, A. Schiper, and P. Wojciechowski, A Step Towards a New Generation of Group Communication Systems, Proceedings of International Middleware Conference, pp.414-432, 2003.
DOI : 10.1007/3-540-44892-6_21

R. Nath, Fault tolerance of the application manager in Vigne, tech. rep, 2008.

M. Wiesmann, F. Pedone, A. Schiper, B. Kemme, and G. Alonso, Understanding replication in databases and distributed systems, Proceedings 20th IEEE International Conference on Distributed Computing Systems, pp.464-474, 2000.
DOI : 10.1109/ICDCS.2000.840959

F. B. Schneider, Implementing fault-tolerant services using the state machine approach: a tutorial, ACM Computing Surveys, vol.22, issue.4, pp.299-319, 1990.
DOI : 10.1145/98163.98167

A. Schiper, Dynamic group communication Distributed Computing, pp.359-374, 2006.

N. Budhiraja, K. Marzullo, F. B. Schneider, and S. Toueg, The Primary-Backup Approach, pp.199-216, 1993.

R. Guerraoui and A. Schiper, Software-based replication for fault tolerance, Computer, vol.30, issue.4, pp.68-74, 1997.
DOI : 10.1109/2.585156

D. Powell, I. Bey, and J. Leuridan, Delta Four: A Generic Architecture for Dependable Distributed Computing, 1991.

X. Défago, A. Schiper, and N. Sergent, Semi-passive replication, Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281), pp.43-50, 1998.
DOI : 10.1109/RELDIS.1998.740473

M. Chérèque, D. Powell, P. Reynier, J. Richier, and J. Voiron, Active replication in Delta-4, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing, pp.28-37, 1992.
DOI : 10.1109/FTCS.1992.243618

E. K. Lua, J. Crowcroft, M. Pias, R. Sharma, and S. Lim, A survey and comparison of peer-to-peer overlay network schemes, IEEE Communications Surveys and Tutorials, vol.7, pp.72-93, 2005.

A. Luckow and B. Schnor, Service replication in Grids: Ensuring consistency in a dynamic, failure-prone environment, 2008 IEEE International Symposium on Parallel and Distributed Processing, pp.1-7, 2008.
DOI : 10.1109/IPDPS.2008.4536211

G. Pierre, T. Schütt, J. Domaschka, and M. Coppola, Highly available and scalable grid services, Proceedings of the Third Workshop on Dependable Distributed Data Management, WDDM '09, pp.18-20, 2009.
DOI : 10.1145/1518691.1518697

X. Zhang, D. Zagorodnov, M. Hiltunen, K. Marzullo, and R. D. Schlichting, Fault-tolerant Grid Services Using Primary-Backup: Feasibility and Performance, Proceedings of the 2004 IEEE International Conference on Cluster Computing, pp.105-114, 2004.

S. Djilali, T. Herault, O. Lodygensky, T. Morlier, G. Fedak et al., RPC-V: Toward Fault-Tolerant RPC for Internet Connected Desktop Grids with Volatile Nodes, Proceedings of the ACM/IEEE SC2004 Conference, pp.39-39, 2004.
DOI : 10.1109/SC.2004.51

URL : https://hal.archives-ouvertes.fr/in2p3-00457039

M. V. Reddy, A. V. Srinivas, T. Gopinath, and D. Janakiram, Vishwa: A reconfigurable P2P middleware for Grid Computations, 2006 International Conference on Parallel Processing (ICPP'06), pp.381-390, 2006.
DOI : 10.1109/ICPP.2006.75

N. Drost, R. V. Van-nieuwpoort, and H. Bal, Simple Locality-Aware Coallocation in Peer-to-Peer Supercomputing, Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID '06), pp.14-24, 2006.

X. Défago, A. Schiper, and P. Urbán, Total order broadcast and multicast algorithms, ACM Computing Surveys, vol.36, issue.4, pp.372-421, 2004.
DOI : 10.1145/1041680.1041682

T. Chandra and S. Toueg, Unreliable failure detectors for reliable distributed systems, Journal of the ACM, vol.43, issue.2, pp.225-267, 1996.
DOI : 10.1145/226643.226647

B. Temkow, A. Bosneag, X. Li, and M. Brockmeyer, PaxonDHT: achieving consensus in distributed hash tables, International Symposium on Applications and the Internet (SAINT'06), pp.236-244, 2006.
DOI : 10.1109/SAINT.2006.48

L. Lamport, The part-time parliament, ACM Transactions on Computer Systems, vol.16, issue.2, pp.133-169, 1998.
DOI : 10.1145/279227.279229

A. Kota, U. Tatsuya, S. Masanori, I. Hayato, and M. Toshio, Toward Fault- Tolerant P2P Systems: Constructing a Stable Virtual Peer from Multiple Unstable Peers, Proceedings of The First International Conference on Advances in P2P Systems (AP2PS '09), pp.104-110, 2009.

A. Muthitacharoen, S. Gilbert, and R. Morris, Etna: A Fault-tolerant Algorithm for Atomic Mutable DHT Data, 2005.

U. Bartlang and J. P. Muller, DhtFlex: A Flexible Approach to Enable Efficient Atomic Data Management Tailored for Structured Peer-to-Peer Overlays, 2008 Third International Conference on Internet and Web Applications and Services, pp.377-384, 2008.
DOI : 10.1109/ICIW.2008.36

M. Moser and S. Haridi, Atomic Commitment in Transactional DHTs, Proceedings of the CoreGRID Symposium, p.151, 2007.
DOI : 10.1007/978-0-387-72498-0_14

R. Boichat, P. Dutta, S. Frølund, and R. Guerraoui, Deconstructing paxos, ACM SIGACT News, vol.34, issue.1, pp.47-67, 2003.
DOI : 10.1145/637437.637447

P. Urban and A. Schiper, Comparing the Performance of Two Consensus Algorithms with Centralized and Decentralized Communication Schemes, Tech. Rep. LSR-REPORT, 2004.

T. Chandra, V. Hadzilacos, and S. Toueg, The weakest failure detector for solving consensus, Journal of the ACM, vol.43, issue.4, pp.685-722, 1996.
DOI : 10.1145/234533.234549

L. Lamport, Fast Paxos, Distributed Computing, pp.79-103, 2006.
DOI : 10.1007/s00446-006-0005-x

Y. Mao, F. Junqueira, and K. Marzullo, Mencius: Building efficient replicated state machines for WANs, Proceedings of the 8th USENIX Symposium on Operating systems Design and Implementation, 2008.