K. Antypas, J. Shalf, and H. Wasserman, NERSC-6 Workload Analysis and Benchmark Selection Process, 2008.

K. Asanovic, R. Bodik, B. C. Catanzaro, J. J. Gebis, P. Husbands et al., The Landscape of Parallel Computing Research: A View from Berkeley, 2006.

D. Bailey, T. Harris, W. Saphir, R. Van-der-wilngaart, A. Woo et al., The NAS Parallel Benchmarks 2.0, 1995.

T. N. Bui and C. Jones, Finding good approximate vertex and edge partitions is NP-hard, Information Processing Letters, vol.42, pp.153-159, 1992.

F. Cappello, Fault tolerance in petascale/ exascale systems: Current knowledge, challenges and research opportunities, International Journal of High Performance Computing Applications, vol.23, pp.212-226, 2009.

U. V. and C. Aykanat, PaToH: A multilevel hypergraph partitioning tool, 1999.

W. H. Cunningham, Optimal attack and reinforcement of a network, J. ACM, vol.32, pp.549-561, 1985.

J. Daly, A model for predicting the optimum checkpoint interval for restart dumps, Proceedings of the 2003 international conference on Computational science, ICCS'03, pp.3-12, 2003.

E. N. Elnozahy, L. Alvisi, Y. Wang, and D. B. Johnson, A Survey of Rollback-Recovery Protocols in Message-Passing Systems, ACM Computing Surveys, vol.34, issue.3, pp.375-408, 2002.

M. R. Garey and D. S. Johnson, Computers and Intractability; A Guide to the Theory of NP-Completeness, 1979.

A. Guermouche, T. Ropars, E. Brunet, M. Snir, and F. Cappello, Uncoordinated Checkpointing Without Domino Effect for Send-Deterministic Message Passing Applications, 25th IEEE International Parallel & Distributed Processing Symposium (IPDPS2011), 2011.

J. C. Ho, C. Wang, and F. C. Lau, Scalable Group-Based Checkpoint/Restart for Large-Scale Message-Passing Systems, 22nd IEEE International Parallel and Distributed Processing Symposium, 2008.
DOI : 10.1109/ipdps.2008.4536302

URL : http://www.cs.hku.hk/~clwang/papers/ipdps2008-GroupCKP.pdf

S. Kamil, J. Shalf, L. Oliker, and D. Skinner, Understanding ultra-scale application communication requirements, Proceedings of the 2005 IEEE International Symposium on Workload Characterization, pp.178-187, 2005.
DOI : 10.1109/iiswc.2005.1526015

URL : https://digital.library.unt.edu/ark:/67531/metadc902911/m2/1/high_res_d/925416.pdf

G. Karypis and V. Kumar, MeTiS: A Software Package for Partitioning Unstructured Graphs, Partitioning Meshes, and Computing Fill-Reducing Orderings of Sparse Matrices Version 4.0, Univ. Minnesota, 1998.

E. Meneses, C. L. Mendes, and L. V. Kale, Team-based Message Logging: Preliminary Results, 3rd Workshop on Resiliency in High Performance Computing (Resilience) in Clusters, Clouds, and Grids, 2010.
DOI : 10.1109/ccgrid.2010.110

S. Monnet, C. Morin, and R. Badrinath, Hybrid Checkpointing for Parallel Applications in Cluster Federations, Proceedings of the 2004 IEEE International Symposium on Cluster Computing and the Grid (CCGRID'04), pp.773-782, 2004.
URL : https://hal.archives-ouvertes.fr/inria-00071577

F. Pellegrini, SCOTCH 5.1 User's Guide. LaBRI, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00410327

R. Riesen, Communication Patterns, Workshop on Communication Architecture for Clusters CAC'06, 2006.

R. Thakur, R. Rabenseifner, and W. Gropp, Optimization of Collective Communication Operations in MPICH, International Journal of High Performance Computing Applications, vol.19, issue.1, pp.49-66, 2005.

J. S. Vetter and F. Mueller, Communication Characteristics of Large-Scale Scientific Applications for Contemporary Cluster Architectures, Journal of Parallel and Distributed Computing, vol.63, pp.853-865, 2003.

J. Yang, K. F. Li, W. Li, and D. Zhang, Trading Off Logging Overhead and Coordinating Overhead to Achieve Efficient Rollback Recovery, Concurrency and Computation : Practice and Experience, vol.21, pp.819-853, 2009.
DOI : 10.1002/cpe.1364