Skip to Main content Skip to Navigation
Conference papers

A Fine-Grained Pipelined Implementation of LU Decomposition on SIMD Processors

Abstract : The LU decomposition is a widely used method to solve the dense linear algebra in many scientific computation applications. In recent years, the single instruction multiple data (SIMD) technology has been a popular method to accelerate the LU decomposition. However, the pipeline parallelism and memory bandwidth utilization are low when the LU decomposition mapped onto SIMD processors. This paper proposes a fine-grained pipelined implementation of LU decomposition on SIMD processors. The fine-grained algorithm well utilizes data dependences of the native algorithm to explore the fine-grained parallelism among all the computation resources. By transforming the non-coalesced memory access to coalesced version, the proposed algorithm can achieve the high pipeline parallelism and the high efficient memory access. Experimental results show that the proposed technology can achieve a speedup of 1.04x to 1.82x over the native algorithm and can achieve about 89% of the peak performance on the SIMD processor.
Document type :
Conference papers
Complete list of metadata

Cited literature [19 references]  Display  Hide  Download
Contributor : Hal Ifip Connect in order to contact the contributor
Submitted on : Tuesday, April 25, 2017 - 2:33:24 PM
Last modification on : Tuesday, September 3, 2019 - 3:04:02 PM
Long-term archiving on: : Wednesday, July 26, 2017 - 1:56:15 PM


Files produced by the author(s)


Distributed under a Creative Commons Attribution 4.0 International License



Kai Zhang, Shuming Chen, Wei Liu, Xi Ning. A Fine-Grained Pipelined Implementation of LU Decomposition on SIMD Processors. 10th International Conference on Network and Parallel Computing (NPC), Sep 2013, Guiyang, China. pp.39-48, ⟨10.1007/978-3-642-40820-5_4⟩. ⟨hal-01513757⟩



Record views


Files downloads