Bridging the gap between OpenMP 4.0 and native runtime systems for the fast multipole method

Emmanuel Agullo 1, 2 Olivier Aumage 3, 2 Berenger Bramas 4 Olivier Coulaud 1, 2 Samuel Pitoiset 3, 2
1 HiePACS - High-End Parallel Algorithms for Challenging Numerical Simulations
LaBRI - Laboratoire Bordelais de Recherche en Informatique, Inria Bordeaux - Sud-Ouest
3 STORM - STatic Optimizations, Runtime Methods
LaBRI - Laboratoire Bordelais de Recherche en Informatique, Inria Bordeaux - Sud-Ouest
Abstract : With the advent of complex modern architectures, the low-level paradigms long considered sufficient to build High Performance Computing (HPC) numerical codes have met their limits. Achieving efficiency, ensuring portability, while preserving programming tractability on such hardware prompted the HPC community to design new, higher level paradigms. The successful ports of fully-featured numerical libraries on several recent runtime system proposals have shown, indeed, the benefit of task-based parallelism models in terms of performance portability on complex platforms. However, the common weakness of these projects is to deeply tie applications to specific expert-only runtime system APIs. The \omp specification, which aims at providing a common parallel programming means for shared-memory platforms, appears as a good candidate to address this issue thanks to the latest task-based constructs introduced as part of its revision 4.0. The goal of this paper is to assess the effectiveness and limits of this support for designing a high-performance numerical library. We illustrate our discussion with the \scalfmm library, which implements state-of-the-art fast multipole method (FMM) algorithms, that we have deeply re-designed with respect to the most advanced features provided by \omp 4. We show that \omp 4 allows for significant performance improvements over previous \omp revisions on recent multicore processors. We furthermore propose extensions to the \omp 4 standard and show how they can enhance FMM performance. To assess our statement, we have implemented this support within the \klanglong source-to-source compiler that translates \omp directives into calls to the \starpu task-based runtime system. This study shows that we can take advantage of the advanced capabilities of a fully-featured runtime system without resorting to a specific, native runtime port, hence bridging the gap between the \omp standard and the very high performance that was so far reserved to expert-only runtime system APIs.
Complete list of metadatas

Cited literature [16 references]  Display  Hide  Download

https://hal.inria.fr/hal-01372022
Contributor : Olivier Coulaud <>
Submitted on : Monday, September 26, 2016 - 5:07:08 PM
Last modification on : Tuesday, May 14, 2019 - 11:38:08 AM

File

RR-8953.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01372022, version 1

Relations

Citation

Emmanuel Agullo, Olivier Aumage, Berenger Bramas, Olivier Coulaud, Samuel Pitoiset. Bridging the gap between OpenMP 4.0 and native runtime systems for the fast multipole method. [Research Report] RR-8953, Inria. 2016, pp.49. ⟨hal-01372022⟩

Share

Metrics

Record views

500

Files downloads

235