Dynamic Inter-Thread Vectorization Architecture: extracting DLP from TLP

Abstract : Threads of Single-Program Multiple-Data (SPMD) applications often execute the same instructions on different data. We propose the Dynamic Inter-Thread Vectorization Architecture (DITVA) to leverage this implicit data-level parallelism in SPMD applications by assembling dynamic vector instructions at runtime. DITVA extends an SIMD-enabled in-order SMT processor with an inter-thread vectorization execution mode. In this mode, multiple scalar threads running in lockstep share a single instruction stream and their respective instruction instances are aggregated into SIMD instructions. To balance thread-and data-level parallelism, threads are statically grouped into fixed-size independently scheduled warps. DITVA leverages existing SIMD units and maintains binary compatibility with existing CPU architectures. Our evaluation on the SPMD applications from the PARSEC and Rodinia OpenMP benchmarks shows that a 4-warp × 4-lane 4-issue DITVA architecture with a realistic bank-interleaved cache achieves 1.55× higher performance than a 4-thread 4-issue SMT architecture with AVX instructions while fetching and issuing 51% fewer instructions, achieving an overall 24% energy reduction.
Document type :
Conference papers
Complete list of metadatas

Cited literature [28 references]  Display  Hide  Download

https://hal.inria.fr/hal-01356202
Contributor : Sylvain Collange <>
Submitted on : Thursday, August 25, 2016 - 2:53:02 PM
Last modification on : Thursday, February 7, 2019 - 4:16:25 PM

File

ditva.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01356202, version 1

Citation

Sajith Kalathingal, Sylvain Collange, Bharath Narasimha Swamy, André Seznec. Dynamic Inter-Thread Vectorization Architecture: extracting DLP from TLP. International Symposium on Computer Architecture and High-Performance Computing (SBAC-PAD), Oct 2016, Los Angeles, United States. ⟨hal-01356202⟩

Share

Metrics

Record views

1061

Files downloads

409