Dynamic Inter-Thread Vectorization Architecture: extracting DLP from TLP

Abstract : Threads of Single-Program Multiple-Data (SPMD) applications often execute the same instructions on different data. We propose the Dynamic Inter-Thread Vectorization Architecture (DITVA) to leverage this implicit data-level parallelism in SPMD applications by assembling dynamic vector instructions at runtime. DITVA extends an SIMD-enabled in-order SMT processor with an inter-thread vectorization execution mode. In this mode, multiple scalar threads running in lockstep share a single instruction stream and their respective instruction instances are aggregated into SIMD instructions. To balance thread-and data-level parallelism, threads are statically grouped into fixed-size independently scheduled warps. DITVA leverages existing SIMD units and maintains binary compatibility with existing CPU architectures. Our evaluation on the SPMD applications from the PARSEC and Rodinia OpenMP benchmarks shows that a 4-warp × 4-lane 4-issue DITVA architecture with a realistic bank-interleaved cache achieves 1.55× higher performance than a 4-thread 4-issue SMT architecture with AVX instructions while fetching and issuing 51% fewer instructions, achieving an overall 24% energy reduction.
Type de document :
Communication dans un congrès
International Symposium on Computer Architecture and High-Performance Computing (SBAC-PAD), Oct 2016, Los Angeles, United States
Liste complète des métadonnées

Littérature citée [28 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01356202
Contributeur : Sylvain Collange <>
Soumis le : jeudi 25 août 2016 - 14:53:02
Dernière modification le : mercredi 16 mai 2018 - 11:24:11

Fichier

ditva.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01356202, version 1

Citation

Sajith Kalathingal, Sylvain Collange, Bharath Narasimha Swamy, André Seznec. Dynamic Inter-Thread Vectorization Architecture: extracting DLP from TLP. International Symposium on Computer Architecture and High-Performance Computing (SBAC-PAD), Oct 2016, Los Angeles, United States. 〈hal-01356202〉

Partager

Métriques

Consultations de la notice

942

Téléchargements de fichiers

291