Skip to Main content Skip to Navigation
New interface
Conference papers

Efficient Data Structures for a Hybrid Parallel and Vectorized Particle-in-Cell Code

Yann A Barsamian 1, 2 Sever Adrian Hirstoaga 3, 4 Eric Violard 1, 2 
2 CAMUS - Compilation pour les Architectures MUlti-coeurS
Inria Nancy - Grand Est, ICube - Laboratoire des sciences de l'ingénieur, de l'informatique et de l'imagerie
4 TONUS - TOkamaks and NUmerical Simulations
IRMA - Institut de Recherche Mathématique Avancée, Inria Nancy - Grand Est
Abstract : The contribution of the present work relies on an innovative and judicious combination of several optimization techniques for achieving high performance when using automatic vectorization and hybrid MPI/OpenMP parallelism in a Particle-in-Cell (PIC) code. The domain of application is plasma physics: the code simulates 2d2v Vlasov-Poisson systems on Cartesian grids with periodic boundary conditions. Overall, our code processes 65 million particles/second per core on Intel Haswell (without hyper-threading) and achieves a good weak scaling up to 0.4 trillion particles on 8,192 cores. The optimizations mainly consist in using (i) a structure of arrays for the particles, (ii) an efficient data structure for the electric field and the charge density, and (iii) an appropriate code for automatic vectorization of the charge accumulation and of the positions' update. In particular, we use space-filling curves to enhance data locality while enabling vectorization: starting from a redundant cell-based data structure for the electric field and for the charge density, we compare several space-filling curves for an efficient ordering of these data and we obtain a gain of 36% in the number of L2 and L3 cache misses when using a Morton curve instead of the classical row-major one. In addition, by proposing a specific writing of the updating positions code we achieve a 31% time improvement in that step. The optimizations bring an overall gain in the execution time of 42% with respect to a standard code. The parallelization of the particle loops is simply performed by means of both distributed and shared memory paradigms, without domain decomposition. We explain the weak and the strong scalings of the code bounded as expected by the overhead of the MPI communications.
Complete list of metadata

Cited literature [18 references]  Display  Hide  Download
Contributor : Sever Hirstoaga Connect in order to contact the contributor
Submitted on : Thursday, June 29, 2017 - 5:51:17 PM
Last modification on : Tuesday, October 25, 2022 - 4:24:52 PM
Long-term archiving on: : Monday, January 22, 2018 - 7:39:51 PM


Files produced by the author(s)



Yann A Barsamian, Sever Adrian Hirstoaga, Eric Violard. Efficient Data Structures for a Hybrid Parallel and Vectorized Particle-in-Cell Code. IPDPSW 2017 - IEEE International Parallel and Distributed Processing Symposium Workshops , May 2017, Lake Buena Vista, FL, United States. pp.1168-1177, ⟨10.1109/IPDPSW.2017.74⟩. ⟨hal-01504645v3⟩



Record views


Files downloads