DITVA: Dynamic Inter-Thread Vectorization Architecture - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Article Dans Une Revue Journal of Parallel and Distributed Computing Année : 2018

DITVA: Dynamic Inter-Thread Vectorization Architecture

Résumé

In the Single-Program Multiple-Data (SPMD) programming model, threads of an application exhibit very similar control flows and often execute the same instructions, but on different data. In this paper, we propose the Dynamic Inter-thread Vectorization Architecture (DITVA) to leverage the implicit Data Level Parallelism that exists across threads on SPMD applications. By assembling dynamic vector instructions at runtime, DITVA extends an in-order SMT processor with a dynamic inter-thread vector execution mode akin to the Single-Instruction, Multiple-Thread model of Graphics Processing Units. In this mode, multiple scalar threads running in lockstep share a single instruction stream and their respective instruction instances are aggregated into SIMD instructions. DITVA can leverage existing SIMD units and maintains binary compatibility with existing CPU architec-tures. To balance thread-and data-level parallelism, threads are statically grouped into fixed-size independently scheduled warps. Additionally, to maximize dynamic vector-ization opportunities, we adapt the fetch steering policy to favor thread synchronization within warps and thus improve lockstep execution. Our experimental evaluation of the DITVA architecture on the SPMD applications from the PARSEC and Rodinia OpenMP benchmarks show that a 4-warp × 4-lane 4-issue DITVA architecture with a realistic bank-interleaved cache achieves 1.55× higher performance compared to a 4-thread 4-issue SMT architecture with AVX instructions , while fetching and issuing 51% fewer instructions, and achieving an overall 24% energy reduction. DITVA also enables applications limited by memory to scale with higher bandwidth architectures. For instance, when the bandwidth is increased from 2GB/s to 16GB/s, we find that memory bound applications show an improvement in performance by 3× in comparison with the baseline SMT. Therefore, DITVA appears as a cost-effective design for achieving very high single-core performance on SPMD parallel sections.
Fichier principal
Vignette du fichier
Kalathingal_DITVA_JPDC18 (1).pdf (3.85 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01655904 , version 1 (05-12-2017)

Identifiants

Citer

Sajith Kalathingal, Caroline Collange, Bharath N Swamy, André Seznec. DITVA: Dynamic Inter-Thread Vectorization Architecture. Journal of Parallel and Distributed Computing, 2018, pp.1-32. ⟨10.1016/j.jpdc.2017.11.006⟩. ⟨hal-01655904⟩
858 Consultations
349 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More