Skip to Main content Skip to Navigation
Conference papers

Optimization and parallelization of Emedge3D on shared memory architecture

Abstract : This report presents a study of techniques used to speedup a scientific simulation code. The techniques include sequential optimizations as well as the parallelization with OpenMP. This work is carried out on two different multicore shared memory architectures, namely a cutting edge 8x8 core CPU and a more common 2x6 core board. Our target application is representative of many memory bound codes, and the techniques we present show how to overcome the burden of the memory bandwidth limit, which is quickly reached on multi-core or many-core with shared memory architectures. To achieve efficient speedups, strategies are applied to lower the computation costs, and to maximize the use of processors caches. Optimizations are: minimizing memory accesses, simplifying and reordering computations, and tiling loops. On 12 cores processor Intel X5675, aggregation of these optimizations results in an execution time 21.6 faster, compared to the original version on one core.
Complete list of metadatas

Cited literature [12 references]  Display  Hide  Download
Contributor : Nicolas Crouseilles <>
Submitted on : Monday, July 29, 2013 - 2:18:15 PM
Last modification on : Tuesday, December 8, 2020 - 9:49:53 AM
Long-term archiving on: : Wednesday, October 30, 2013 - 4:12:37 AM


Files produced by the author(s)



Matthieu Kuhn, Guillaume Latu, Stéphane Genaud, Nicolas Crouseilles. Optimization and parallelization of Emedge3D on shared memory architecture. IEEE, Sep 2013, timisoara, Romania. pp.503-510, ⟨10.1109/SYNASC2013.72⟩. ⟨hal-00848869⟩



Record views


Files downloads