Multiscale computing in the exascale era, Journal of Computational Science, pp.15-25, 2017. ,
Multi-scale HPC system for multi-scale discrete simulation-Development and application of a supercomputer with 1 Petaflops peak performance in single precision, pp.332-335, 2009. ,
Toward Understanding I/O Behavior in HPC Workflows, 2018 IEEE/ACM 3rd International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems (PDSW-DISCS), 2018. ,
, European Technology Platform for High Performance Computing, ETP4HPC Strategic Research Agenda: Achieving HPC Leadership in Europe, 2013.
, European Technology Platform for High Performance Computing, 2015.
, Strategic Research Agenda 2017: European Multi-annual HPC Technology Roadmap, 2017.
, Eurolab-4-HPC Long-Term Vision on High-Performance Computing, 2017.
, The Opportunities and Challenges of Exascale Computing: Summary Report of the Advanced Scientific Computing Advisory Committee (ASCAC) Subcommittee", 2010.
, Exascale Programming Challenges: Report of the 2011 Workshop on Exascale Programming Challenges, 2011.
, Preliminary Conceptual Design for an Exascale Computing Initiative, 2014.
, Top Ten Exascale Research Challenges: DOE ASCAC Subcommittee Report, 2014.
Toward Exascale Resilience, 2009. ,
Toward Exascale Resilience: 2014 Update, 2014. ,
Basic concepts and taxonomy of dependable and secure computing, IEEE Transactions on Dependable and Secure Computing, pp.11-33, 2004. ,
Architecture Design for Soft Errors, 2008. ,
How To Kill A Supercomputer: Dirty Power, Cosmic Rays, and Bad Solder, IEEE Spectrum, 2013. ,
, Intel Xeon Processor E7 Family: Reliability, Availability, and Serviceability Advanced data integrity and resiliency support for mission-critical deployments, 2011.
, Intel Xeon Processor E7-8800/4800/2800 v2 Product Family Based Platform Reliability, Availability and Serviceability (RAS) Integration and Validation Guide, 2014.
, Intel Corporation, New Reliability, Availability, and Serviceability (RAS) Features in the Intel Xeon Processor Family, 2017.
, Reliability, Availability, & Serviceability (RAS) of Intel Infrastructure Management. Technologies Feature Support. Feature Brief, 2017.
, Intel Xeon Scalable Platform. Product Brief, 2017.
, Intel Corporation, Intel Product Quick Reference Matrix -Servers", 2018.
, Intel Xeon Processor Scalable Family. Datasheet, Volume One: Electrical", 2018.
, ARM Reliability, Availability, and Serviceability (RAS) Specification ARMv8, for the ARMv8-A architecture profile, 2017.
Ampere 64-bit Arm Processor. Product brief", 2018. ,
Bullion S4 the most advanced workspace for fast data. Fact sheet, 2015. ,
Bull Sequana S series. Technical specification, 2017. ,
Advanced Reliability for Intel Xeon Processors on Dell PowerEdge Servers, Technical White Paper, 2010. ,
PowerEdge R930, 2016. ,
Five Ways to Ensure Reliability, Availability, and Serviceability in Your Enterprise Environment, 2016. ,
Avoiding server downtime from hardware errors in system memory with HP Memory Quarantine, 2012. ,
Reliability, Availability, and Serviceability. Features of the IBM eX5 Portfolio ,
, Lenovo X6 Server RAS Features", 2018.
, RAS Features of the Lenovo ThinkSystem SR950 and SR850", 2018.
Always-on" reliability on x86, 2018. ,
, Oracle Server X5-4 System Architecture, 2016.
Five Highlights of the ThinkSystem SR950", 2018. ,
, Oracle Server X7-2 and Oracle Server X7-2L System Architecture. White paper, 2017.
, Oracle Server X7-2. Data sheet, 2017.
, Oracle Server X7-8 Eight-Socket Configuration. Data sheet, 2017.
, Memory RAS Configuration. User's guide, 2017.
, Device Reliability Report. UG116 (v10.9, 2018.
, 7 Series FPGAs Memory Resources. UG473 (v1.12), 2016.
Intel Stratix 10 Embedded Memory User Guide, v18.1", 2018. ,
AN 737: SEU Detection and Recovery in Intel Arria 10 Devices", 2018. ,
AN 711: Power Reduction Features in Intel Arria 10 Devices", 2018. ,
Reliability Report (MNL-1085), 2017. ,
Predicting Faults in High Performance Computing Systems: An In-Depth Survey of the State-of-the-Practice, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), 2019. ,
Reducing False Node Failure Predictions in HPC, The 26th IEEE International Conference on High Performance Computing, Data, and Analytics, 2019. ,
ARM HPC Ecosystem and the Reemergence of Vectors, Proceedings of the Computing Frontiers Conference, 2017. ,
Efficiency modeling and exploration of 64-bit ARM compute nodes for exascale, Microprocess. Microsyst, vol.53, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01586191