M. Chandy and L. Lamport, Distributed snapshots: determining global states of distributed systems, ACM Transactions on Computer Systems, vol.3, issue.1, pp.63-75, 1985.
DOI : 10.1145/214451.214456

J. Duell, P. Hargrove, and E. Roman, The design and implementation of berkeley lab's linux checkpoint/restart, 2003.

M. Elnozahy, L. Alvisi, Y. Wang, and D. B. Johnson, A survey of rollback-recovery protocols in message-passing systems, ACM Computing Surveys, vol.34, issue.3, 1999.
DOI : 10.1145/568522.568525

I. Haddad, C. Leangsuksun, and S. L. Scott, Ha-oscar: the birth of highly available oscar, Linux J, issue.1151, 2003.

R. Koo and S. Toueg, Checkpointing and Rollback-Recovery for Distributed Systems, IEEE Transactions on Software Engineering, vol.13, issue.1, pp.23-31, 1987.
DOI : 10.1109/TSE.1987.232562

R. Lottiaux and C. Morin, Containers: a sound basis for a true single system image, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid, 2001.
DOI : 10.1109/CCGRID.2001.923177

URL : https://hal.archives-ouvertes.fr/hal-01271232

R. Lottiaux, B. Boissinot, P. Gallard, G. Vallée, and C. Morin, OpenMosix, OpenSSI and Kerrighed: a comparative study, CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005., 2005.
DOI : 10.1109/CCGRID.2005.1558672

URL : https://hal.archives-ouvertes.fr/hal-01271223

C. Morin and I. Puaut, A survey of recoverable distributed shared virtual memory systems, IEEE Transactions on Parallel and Distributed Systems, vol.8, issue.9, pp.959-969, 1997.
DOI : 10.1109/71.615441

C. Morin, P. Gallard, R. Lottiaux, and G. Vallée, Towards an efficient Single System Image cluster operating system, Future Generation Computer Systems, vol.20, issue.2, 2004.
DOI : 10.1016/s0167-739x(03)00170-5

URL : https://hal.archives-ouvertes.fr/hal-01271205

C. Morin, R. Lottiaux, G. Vallée, P. Gallard, G. Utard et al., Kerrighed: A Single System Image Cluster Operating System for High Performance Computing, Proc. of Europar 2003: Parallel Processing, pp.1291-1294, 2003.
DOI : 10.1007/978-3-540-45209-6_175

URL : https://hal.archives-ouvertes.fr/hal-01271227

E. Pinheiro, Truly-transparent checkpointing of parallel applications

B. Randell, System structure for software fault tolerance, Software Engineering, vol.1, issue.2, pp.221-232, 1975.

D. L. Russell, State Restoration in Systems of Communicating Processes, IEEE Transactions on Software Engineering, vol.6, issue.2, pp.183-194, 1980.
DOI : 10.1109/TSE.1980.230469

. Esposito-mastroserio-tortone, Openmosix approach to build scalable hpc farms with an easy management infrastructure

G. Vallée, R. Lottiaux, D. Margery, C. Morin, and J. Berthou, Ghost process: a sound basis to implement process duplication, migration and checkpoint/restart in linux clusters, The 4th International Symposium on Parallel and Distributed Computing, 2005.

G. Vallée, R. Lottiaux, D. Margery, C. Morin, and J. Berthou, Ghost process: a sound basis to implement process duplication, migration and checkpoint/restart in linux clusters, The 4th International Symposium on Parallel and Distributed Computing, 2005.

G. Vallée, R. Lottiaux, L. Rilling, J. Berthou, I. Dutka-malhen et al., A CASE FOR SINGLE SYSTEM IMAGE CLUSTER OPERATING SYSTEMS: THE KERRIGHED APPROACH, Parallel Processing Letters, vol.13, issue.02, 2003.
DOI : 10.1142/S0129626403001185