NERSC-6 Workload Analysis and Benchmark Selection Process, 2008. ,
The Landscape of Parallel Computing Research: A View from Berkeley, 2006. ,
The NAS Parallel Benchmarks 2.0, 1995. ,
Finding good approximate vertex and edge partitions is NP-hard, Information Processing Letters, vol.42, pp.153-159, 1992. ,
Fault tolerance in petascale/ exascale systems: Current knowledge, challenges and research opportunities, International Journal of High Performance Computing Applications, vol.23, pp.212-226, 2009. ,
PaToH: A multilevel hypergraph partitioning tool, 1999. ,
Optimal attack and reinforcement of a network, J. ACM, vol.32, pp.549-561, 1985. ,
A model for predicting the optimum checkpoint interval for restart dumps, Proceedings of the 2003 international conference on Computational science, ICCS'03, pp.3-12, 2003. ,
A Survey of Rollback-Recovery Protocols in Message-Passing Systems, ACM Computing Surveys, vol.34, issue.3, pp.375-408, 2002. ,
Computers and Intractability; A Guide to the Theory of NP-Completeness, 1979. ,
Uncoordinated Checkpointing Without Domino Effect for Send-Deterministic Message Passing Applications, 25th IEEE International Parallel & Distributed Processing Symposium (IPDPS2011), 2011. ,
Scalable Group-Based Checkpoint/Restart for Large-Scale Message-Passing Systems, 22nd IEEE International Parallel and Distributed Processing Symposium, 2008. ,
DOI : 10.1109/ipdps.2008.4536302
URL : http://www.cs.hku.hk/~clwang/papers/ipdps2008-GroupCKP.pdf
Understanding ultra-scale application communication requirements, Proceedings of the 2005 IEEE International Symposium on Workload Characterization, pp.178-187, 2005. ,
DOI : 10.1109/iiswc.2005.1526015
URL : https://digital.library.unt.edu/ark:/67531/metadc902911/m2/1/high_res_d/925416.pdf
MeTiS: A Software Package for Partitioning Unstructured Graphs, Partitioning Meshes, and Computing Fill-Reducing Orderings of Sparse Matrices Version 4.0, Univ. Minnesota, 1998. ,
Team-based Message Logging: Preliminary Results, 3rd Workshop on Resiliency in High Performance Computing (Resilience) in Clusters, Clouds, and Grids, 2010. ,
DOI : 10.1109/ccgrid.2010.110
Hybrid Checkpointing for Parallel Applications in Cluster Federations, Proceedings of the 2004 IEEE International Symposium on Cluster Computing and the Grid (CCGRID'04), pp.773-782, 2004. ,
URL : https://hal.archives-ouvertes.fr/inria-00071577
SCOTCH 5.1 User's Guide. LaBRI, 2008. ,
URL : https://hal.archives-ouvertes.fr/hal-00410327
Communication Patterns, Workshop on Communication Architecture for Clusters CAC'06, 2006. ,
Optimization of Collective Communication Operations in MPICH, International Journal of High Performance Computing Applications, vol.19, issue.1, pp.49-66, 2005. ,
Communication Characteristics of Large-Scale Scientific Applications for Contemporary Cluster Architectures, Journal of Parallel and Distributed Computing, vol.63, pp.853-865, 2003. ,
Trading Off Logging Overhead and Coordinating Overhead to Achieve Efficient Rollback Recovery, Concurrency and Computation : Practice and Experience, vol.21, pp.819-853, 2009. ,
DOI : 10.1002/cpe.1364