Skip to Main content Skip to Navigation
Reports

High performance tensor-vector multiplies on shared memory systems

Abstract : Tensor–vector multiplication is one of the core components in tensor computations. We have recently investigated high performance, single core implementation of this bandwidth-bound operation. In this work, we investigate efficient, shared memory algorithms to carry out this operation. Upon carefully analyzing the design space, we implement a number of alternatives using OpenMP and compare them experimentally. Experimental results on up to 8 socket systems show near peak performance for the proposed algorithms.
Complete list of metadatas

Cited literature [11 references]  Display  Hide  Download

https://hal.inria.fr/hal-02123526
Contributor : Equipe Roma <>
Submitted on : Thursday, October 24, 2019 - 6:11:17 PM
Last modification on : Tuesday, November 19, 2019 - 2:40:26 AM

File

RR-9274.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02123526, version 2

Collections

Citation

Filip Pawłowski, Bora Uçar, Albert-Jan Yzelman. High performance tensor-vector multiplies on shared memory systems. [Research Report] RR-9274, Inria - Research Centre Grenoble – Rhône-Alpes. 2019, pp.1-20. ⟨hal-02123526v2⟩

Share

Metrics

Record views

52

Files downloads

347