Scaling and optimizing the Gysela code on a cluster of many-core processors

Abstract : The current generation of the Xeon Phi Knights Landing (KNL) processor provides a highly multi-threaded environment on which regular programming models such as MPI/OpenMP can be used. This specific hardware offers both large memory bandwidth and large computing resources and is currently available on computing facilities. Many factors impact the performance achieved by applications, one of the key points is the efficient exploitation of SIMD vector units, another one is the memory access pattern. Thus, vectorization and optimization works have been conducted on a plasma turbulence application, namely Gysela. A set of different techniques have been used: loop splitting, inlining, grouping a set of LU solve operations, removing conditionals and some loop nests, auto-tuning of one computation kernel, changing a key numerical scheme – Lagrange interpolation instead of cubic splines. As a result, KNL execution times have been reduced by up to a factor 3 in some configurations. This effort has also permitted to gain a speedup of 2x on Broadwell architecture and 3x on Skylake. Nice scalability curves up to a few thousands cores have been obtained on a strong scaling experiment. Incremental work for vectorizing the Gysela code meant a large payoff without resorting to writing assembly code or using low-level intrinsics.
Liste complète des métadonnées
Contributor : Guillaume Latu <>
Submitted on : Monday, October 1, 2018 - 10:52:26 AM
Last modification on : Thursday, February 7, 2019 - 4:53:50 PM
Document(s) archivé(s) le : Wednesday, January 2, 2019 - 1:18:45 PM


Files produced by the author(s)


  • HAL Id : hal-01719208, version 2


Guillaume Latu, Yuuichi Asahi, Julien Bigot, Tamás Fehér, Virginie Grandgirard. Scaling and optimizing the Gysela code on a cluster of many-core processors. SBAC-PAD 2018, WAMCA workshop, Sep 2018, Lyon, France. ⟨hal-01719208v2⟩



Record views


Files downloads