Runtime On-Stack Parallelization of Dependence-Free For-Loops in Binary Programs - Archive ouverte HAL Access content directly
Journal Articles IEEE Letters of the Computer Society Year : 2019

Runtime On-Stack Parallelization of Dependence-Free For-Loops in Binary Programs

(1, 2) , (2, 3) , (4)
1
2
3
4

Abstract

With the multicore trend, the need for automatic parallelization is more pronounced, especially for legacy and proprietary code where no source code is available and/or the code is already running and restarting is not an option. In this paper, we engineer a mechanism for transforming at runtime a frequent for-loop with no data dependencies in a binary program into a parallel loop, using on-stack replacement. With our mechanism, there is no need for source code, debugging information or restarting the program. Also, the mechanism needs no static instrumentation or information. The mechanism is implemented using the Padrone binary modification system and pthreads, where the remaining iterations of the loop are executed in parallel. The mechanism keeps the running program state by extracting the targeted loop into a separate function and copying the current stack frame into the corresponding frames of the created threads. Initial study is conducted on a set of kernels from the Polybench workload. Experiments results show from 2x to 3.5x speedup from sequential to parallelized code on four cores, which is similar to source code level parallelization.
Fichier principal
Vignette du fichier
LOCS_binary_parallelization.pdf (227.39 Ko) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

hal-02061340 , version 1 (08-03-2019)

Identifiers

Cite

Marwa Yusuf, Ahmed El-Mahdy, Erven Rohou. Runtime On-Stack Parallelization of Dependence-Free For-Loops in Binary Programs. IEEE Letters of the Computer Society, 2019, 2 (1), pp.1-4. ⟨10.1109/LOCS.2019.2896559⟩. ⟨hal-02061340⟩
51 View
119 Download

Altmetric

Share

Gmail Facebook Twitter LinkedIn More