Scaling and optimizing the Gysela code on a cluster of many-core processors

Guillaume Latu; Yuuichi Asahi; Julien Bigot; Tamás Fehér; Virginie Grandgirard

Communication Dans Un Congrès Année : 2018

Scaling and optimizing the Gysela code on a cluster of many-core processors

(1) , , (2) , (3) , (1)

1
2
3

Guillaume Latu

Fonction : Auteur
PersonId : 1275143
IdHAL : guillaume-latu
ORCID : 0009-0001-7274-1305

Institut de Recherche sur la Fusion par confinement Magnétique

Yuuichi Asahi

Fonction : Auteur

Julien Bigot

Fonction : Auteur
PersonId : 2024
IdHAL : julien-bigot
ORCID : 0000-0002-0015-4304
IdRef : 154771996

Maison de la Simulation

Tamás Fehér

Fonction : Auteur

Max-Planck-Institut für Plasmaphysik [Garching]

Virginie Grandgirard

Fonction : Auteur
PersonId : 1167089
ORCID : 0000-0001-7821-9107
IdRef : 177044276

Institut de Recherche sur la Fusion par confinement Magnétique

Résumé

The current generation of the Xeon Phi Knights Landing (KNL) processor provides a highly multi-threaded environment on which regular programming models such as MPI/OpenMP can be used. This specific hardware offers both large memory bandwidth and large computing resources and is currently available on computing facilities. Many factors impact the performance achieved by applications, one of the key points is the efficient exploitation of SIMD vector units, another one is the memory access pattern. Thus, vectorization and optimization works have been conducted on a plasma turbulence application, namely Gysela. A set of different techniques have been used: loop splitting, inlining, grouping a set of LU solve operations, removing conditionals and some loop nests, auto-tuning of one computation kernel, changing a key numerical scheme – Lagrange interpolation instead of cubic splines. As a result, KNL execution times have been reduced by up to a factor 3 in some configurations. This effort has also permitted to gain a speedup of 2x on Broadwell architecture and 3x on Skylake. Nice scalability curves up to a few thousands cores have been obtained on a strong scaling experiment. Incremental work for vectorizing the Gysela code meant a large payoff without resorting to writing assembly code or using low-level intrinsics.

Mots clés

SIMD KNL plasma physics vectorization many-core

Domaines

Calcul parallèle, distribué et partagé [cs.DC] Physique des plasmas [physics.plasm-ph]

Fichier principal

wamca18_gl.pdf (195.68 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Guillaume Latu : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01719208

Soumis le : lundi 1 octobre 2018-10:52:26

Dernière modification le : mercredi 3 avril 2024-11:08:06

Archivage à long terme le : mercredi 2 janvier 2019-13:18:45

Dates et versions

hal-01719208 , version 1 (28-02-2018)

hal-01719208 , version 2 (01-10-2018)

Identifiants

HAL Id : hal-01719208 , version 2

Citer

Guillaume Latu, Yuuichi Asahi, Julien Bigot, Tamás Fehér, Virginie Grandgirard. Scaling and optimizing the Gysela code on a cluster of many-core processors. SBAC-PAD 2018, WAMCA workshop, Sep 2018, Lyon, France. ⟨hal-01719208v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CEA CNRS INRIA MDLS DSM-IRFM UVSQ GENCI CEA-UPSAY UNIV-PARIS-SACLAY CEA-DRF CEA-CAD GS-COMPUTER-SCIENCE

608 Consultations

289 Téléchargements

Scaling and optimizing the Gysela code on a cluster of many-core processors

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager