Efficient Data Structures for a Hybrid Parallel and Vectorized Particle-in-Cell Code

Yann Barsamian 1, 2 Sever Adrian Hirstoaga 3, 4 Eric Violard 1, 2
2 CAMUS - Compilation pour les Architectures MUlti-coeurS
Inria Nancy - Grand Est, ICube - Laboratoire des sciences de l'ingénieur, de l'informatique et de l'imagerie
4 TONUS - TOkamaks and NUmerical Simulations
IRMA - Institut de Recherche Mathématique Avancée, Inria Nancy - Grand Est
Abstract : The contribution of the present work relies on an innovative and judicious combination of several optimization techniques for achieving high performance when using automatic vectorization and hybrid MPI/OpenMP parallelism in a Particle-in-Cell (PIC) code. The domain of application is plasma physics: the code simulates 2d2v Vlasov-Poisson systems on Cartesian grids with periodic boundary conditions. Overall, our code processes 65 million particles/second per core on Intel Haswell (without hyper-threading) and achieves a good weak scaling up to 0.4 trillion particles on 8,192 cores. The optimizations mainly consist in using (i) a structure of arrays for the particles, (ii) an efficient data structure for the electric field and the charge density, and (iii) an appropriate code for automatic vectorization of the charge accumulation and of the positions' update. In particular, we use space-filling curves to enhance data locality while enabling vectorization: starting from a redundant cell-based data structure for the electric field and for the charge density, we compare several space-filling curves for an efficient ordering of these data and we obtain a gain of 36% in the number of L2 and L3 cache misses when using a Morton curve instead of the classical row-major one. In addition, by proposing a specific writing of the updating positions code we achieve a 31% time improvement in that step. The optimizations bring an overall gain in the execution time of 42% with respect to a standard code. The parallelization of the particle loops is simply performed by means of both distributed and shared memory paradigms, without domain decomposition. We explain the weak and the strong scalings of the code bounded as expected by the overhead of the MPI communications.
Liste complète des métadonnées

Littérature citée [20 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01504645
Contributeur : S. A. Hirstoaga <>
Soumis le : jeudi 29 juin 2017 - 17:51:17
Dernière modification le : jeudi 14 décembre 2017 - 09:57:02

Fichier

pdsec3.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Yann Barsamian, Sever Adrian Hirstoaga, Eric Violard. Efficient Data Structures for a Hybrid Parallel and Vectorized Particle-in-Cell Code. Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2017 IEEE International, 2017, pp.1168-1177. 〈http://ieeexplore.ieee.org/document/7965170/〉. 〈10.1109/IPDPSW.2017.74〉. 〈hal-01504645v3〉

Partager

Métriques

Consultations de la notice

227

Téléchargements de fichiers

30