Simulation-based Optimization and Sensibility Analysis of MPI Applications: Variability Matters - Archive ouverte HAL Access content directly
Journal Articles Journal of Parallel and Distributed Computing Year : 2022

## Simulation-based Optimization and Sensibility Analysis of MPI Applications: Variability Matters

(1, 2) , (3, 1)
1
2
3
Tom Cornebize
Arnaud Legrand

#### Abstract

Finely tuning MPI applications and understanding the influence of key parameters (number of processes, granularity, collective operation algorithms, virtual topology, and process placement) is critical to obtain good performance on supercomputers. With the high consumption of running applications at scale, doing so solely to optimize their performance is particularly costly. Having inexpensive but faithful predictions of expected performance could be a great help for researchers and system administrators. The methodology we propose decouples the complexity of the platform, which is captured through statistical models of the performance of its main components (MPI communications, BLAS operations), from the complexity of adaptive applications by emulating the application and skipping regular non-MPI parts of the code. We demonstrate the capability of our method with High-Performance Linpack (HPL), the benchmark used to rank supercomputers in the TOP500, which requires careful tuning. We briefly present (1) how the open-source version of HPL can be slightly modified to allow a fast emulation on a single commodity server at the scale of a supercomputer. Then we present (2) an extensive (in)validation study that compares simulation with real experiments and demonstrates our ability to predict the performance of HPL within a few percent consistently. This study allows us to identify the main modeling pitfalls (e.g., spatial and temporal node variability or network heterogeneity and irregular behavior) that need to be considered. Last, we show (3) how our surrogate'' allows studying several subtle HPL parameter optimization problems while accounting for uncertainty on the platform.

### Dates and versions

hal-03141988 , version 1 (15-02-2021)
hal-03141988 , version 2 (06-01-2022)

### Identifiers

• HAL Id : hal-03141988 , version 2
• ARXIV :
• DOI :

### Cite

Tom Cornebize, Arnaud Legrand. Simulation-based Optimization and Sensibility Analysis of MPI Applications: Variability Matters. Journal of Parallel and Distributed Computing, 2022, ⟨10.1016/j.jpdc.2022.04.002⟩. ⟨hal-03141988v2⟩

### Export

BibTeX TEI Dublin Core DC Terms EndNote Datacite

182 View