Collective I/O Performance on the Santos Dumont Supercomputer

Abstract : The historical gap between processing and data access speeds causes many applications to spend a large portion of their execution on I/O operations. From the point of view of a large-scale, expensive, supercomputer, it is important to ensure applications achieve the best I/O performance to promote an efficient usage of the machine. In this paper, we evaluate the I/O infrastructure of the Santos Dumont supercomputer, the largest one from Latin America. More specifically, we investigate the performance of collective I/O operations. By conducting an analysis of a scientific application that uses the machine, we identify large performance differences between the available MPI implementations. We then further study the observed phenomenon using the BT-IO and IOR benchmarks, in addition to a custom microbenchmark. We conclude that the customized MPI implementation by Bull (used by more than 20% of the jobs) presents the worst performance for small collective write operations. Our results are being used to help the Santos Dumont users to achieve the best performance for their applications. Additionally, by investigating the observed phenomenon, we provide information to help improve future MPI-IO collective write implementations.
Complete list of metadatas

Cited literature [22 references]  Display  Hide  Download

https://hal.inria.fr/hal-01711359
Contributor : Francieli Zanon Boito <>
Submitted on : Saturday, February 17, 2018 - 1:18:32 PM
Last modification on : Friday, October 25, 2019 - 1:31:44 AM
Long-term archiving on: Tuesday, May 8, 2018 - 12:27:42 AM

File

pdp2018.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

André Ramos Carneiro, Jean Luca Bez, Francieli Zanon Boito, Bruno Fagundes, Carla Osthoff, et al.. Collective I/O Performance on the Santos Dumont Supercomputer. PDP 2018 - 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing, Mar 2018, Cambridge, United Kingdom. pp.45-52, ⟨10.1109/PDP2018.2018.00015⟩. ⟨hal-01711359⟩

Share

Metrics

Record views

315

Files downloads

316