The Opportunities and Challenges of Exascale Computing, 2010. ,
A survey of rollback-recovery protocols in message-passing systems, ACM Computing Surveys, vol.34, issue.3, pp.375-408, 2002. ,
DOI : 10.1145/568522.568525
Application monitoring and checkpointing in HPC, Proceedings of the 50th Annual Southeast Regional Conference on, ACM-SE '12, pp.262-267 ,
DOI : 10.1145/2184512.2184574
A study on data deduplication in HPC storage systems, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, 2012. ,
DOI : 10.1109/SC.2012.14
Message logging: pessimistic, optimistic, causal, and optimal, IEEE Transactions on Software Engineering, vol.24, issue.2, pp.149-159, 1998. ,
DOI : 10.1109/32.666828
Uncoordinated Checkpointing Without Domino Effect for Send-Deterministic MPI Applications, 2011 IEEE International Parallel & Distributed Processing Symposium, pp.989-1000, 2011. ,
DOI : 10.1109/IPDPS.2011.95
URL : https://hal.archives-ouvertes.fr/hal-01121937
Design, modeling, and evaluation of a scalable multi-level checkpointing system, SC '10: Proceedings of the 23rd International Conference for High Performance Computing, Networking, Storage and Analysis, pp.1-11, 2010. ,
Hybrid checkpointing using emerging nonvolatile memories for future exascale systems, ACM Transactions on Architecture and Code Optimization, vol.8, issue.2, pp.1-629, 2011. ,
DOI : 10.1145/1970386.1970387
Damaris, CLUSTER '12 -Proceedings of the 2012 IEEE International Conference on Cluster Computing SC '11: Proceedings of 24th International Conference for High Performance Computing, Networking, Storage and Analysis, pp.155-1631, 2011. ,
DOI : 10.1145/2987371
URL : https://hal.archives-ouvertes.fr/inria-00614597
Scalable Reed-Solomon-Based Reliable Local Storage for HPC Applications on IaaS Clouds, Euro-Par '12: 18th International Euro-Par Conference on Parallel Processing, pp.2012-313 ,
DOI : 10.1007/978-3-642-32820-6_32
URL : https://hal.archives-ouvertes.fr/hal-00703119
On the Viability of Checkpoint Compression for Extreme Scale Fault Tolerance, Euro-Par Workshops, pp.302-311, 2011. ,
DOI : 10.1007/978-3-642-29740-3_34
A low-bandwidth network file system, ACM SIGOPS Operating Systems Review, vol.35, issue.5, pp.174-187, 2001. ,
DOI : 10.1145/502059.502052
Fingerprinting by random polynomials Center for Research in Computing Technology, 1981. ,
Avoiding the disk bottleneck in the data domain deduplication file system, FAST'08: Proceedings of the 6th USENIX Conference on File and Storage Technologies, pp.1-1814, 2008. ,
Hydrastor: a scalable secondary storage, FAST '09: Proccedings of the 7th conference on File and storage technologies, pp.197-210, 2009. ,
Adaptive incremental checkpointing for massively parallel systems, Proceedings of the 18th annual international conference on Supercomputing , ICS '04, pp.277-286, 2004. ,
DOI : 10.1145/1006209.1006248
Comparing different approaches for incremental checkpointing: The showdown, Linux'11: The 13th Annual Linux Symposium, pp.69-79, 2011. ,
Transparent, Incremental Checkpointing at Kernel Level: a Foundation for Fault Tolerance for Parallel Computers, ACM/IEEE SC 2005 Conference (SC'05), pp.1-9, 2005. ,
DOI : 10.1109/SC.2005.76
libhashckpt: Hash-Based Incremental Checkpointing Using GPU???s, EuroMPI'11: Proceedings of the 18th European MPI Users' Group Conference on Recent Advances in the Message Passing Interface ,
DOI : 10.1007/978-3-642-24449-0_31
PVFS: A parallel file system for Linux clusters, Proceedings of the 4th Annual Linux Showcase and Conference, pp.317-327, 2000. ,
BlobSeer: Next-generation data management for large scale infrastructures, Journal of Parallel and Distributed Computing, vol.71, issue.2, pp.169-184, 2011. ,
DOI : 10.1016/j.jpdc.2010.08.004
URL : https://hal.archives-ouvertes.fr/inria-00511414
BlobCR, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11, pp.341-3412, 2011. ,
DOI : 10.1145/2063384.2063429
URL : https://hal.archives-ouvertes.fr/inria-00601865
Performance analysis and optimization of MPI collective operations on multi-core clusters, The Journal of Supercomputing, vol.39, issue.3, pp.141-162, 2012. ,
DOI : 10.1007/s11227-009-0296-3
Collective algorithms for sub-communicators, ICS '12: Proceedings of the 26th ACM international conference on Supercomputing, pp.225-234, 2012. ,
A (probably) exact solution to the Birthday Problem, The Ramanujan Journal, vol.4, issue.2, pp.223-238, 2012. ,
DOI : 10.1007/s11139-011-9343-9
A scalable concurrent malloc(3) implementation for freebsd, Proceedings of BSDCan 2006, 2006. ,
The Maximum Intensity of Tropical Cyclones in Axisymmetric Numerical Model Simulations, Monthly Weather Review, vol.137, issue.6, pp.1770-1789, 2009. ,
DOI : 10.1175/2008MWR2709.1