J. C. Ho, C. Wang, and F. C. Lau, Scalable Group-Based Checkpoint/Restart for Large-Scale Message-Passing Systems, 22nd IEEE International Parallel and Distributed Processing Symposium, 2008.
DOI : 10.1109/ipdps.2008.4536302

URL : http://www.cs.hku.hk/~clwang/papers/ipdps2008-GroupCKP.pdf

S. Kamil, J. Shalf, L. Oliker, and D. Skinner, Understanding ultra-scale application communication requirements, Proceedings of the 2005 IEEE International Symposium on Workload Characterization, pp.178-187, 2005.
DOI : 10.1109/iiswc.2005.1526015

URL : https://digital.library.unt.edu/ark:/67531/metadc902911/m2/1/high_res_d/925416.pdf

G. Karypis and V. Kumar, MeTiS: A Software Package for Partitioning Unstructured Graphs, Partitioning Meshes, and Computing Fill-Reducing Orderings of Sparse Matrices Version 4.0, Univ. Minnesota, 1998.

E. Meneses, C. L. Mendes, and L. V. Kale, Team-based Message Logging: Preliminary Results, 3rd Workshop on Resiliency in High Performance Computing (Resilience) in Clusters, Clouds, and Grids, 2010.
DOI : 10.1109/ccgrid.2010.110

S. Monnet, C. Morin, and R. Badrinath, Hybrid Checkpointing for Parallel Applications in Cluster Federations, Proceedings of the 2004 IEEE International Symposium on Cluster Computing and the Grid (CCGRID'04), pp.773-782, 2004.
URL : https://hal.archives-ouvertes.fr/inria-00071577

F. Pellegrini, SCOTCH 5.1 User's Guide. LaBRI, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00410327

R. Riesen, Communication Patterns, Workshop on Communication Architecture for Clusters CAC'06, 2006.

R. Thakur, R. Rabenseifner, and W. Gropp, Optimization of Collective Communication Operations in MPICH, International Journal of High Performance Computing Applications, vol.19, issue.1, pp.49-66, 2005.

J. S. Vetter and F. Mueller, Communication Characteristics of Large-Scale Scientific Applications for Contemporary Cluster Architectures, Journal of Parallel and Distributed Computing, vol.63, pp.853-865, 2003.

J. Yang, K. F. Li, W. Li, and D. Zhang, Trading Off Logging Overhead and Coordinating Overhead to Achieve Efficient Rollback Recovery, Concurrency and Computation : Practice and Experience, vol.21, pp.819-853, 2009.
DOI : 10.1002/cpe.1364