Abstract : The LU decomposition is a widely used method to solve the dense linear algebra in many scientific computation applications. In recent years, the single instruction multiple data (SIMD) technology has been a popular method to accelerate the LU decomposition. However, the pipeline parallelism and memory bandwidth utilization are low when the LU decomposition mapped onto SIMD processors. This paper proposes a fine-grained pipelined implementation of LU decomposition on SIMD processors. The fine-grained algorithm well utilizes data dependences of the native algorithm to explore the fine-grained parallelism among all the computation resources. By transforming the non-coalesced memory access to coalesced version, the proposed algorithm can achieve the high pipeline parallelism and the high efficient memory access. Experimental results show that the proposed technology can achieve a speedup of 1.04x to 1.82x over the native algorithm and can achieve about 89% of the peak performance on the SIMD processor.
Type de document :
Communication dans un congrès
Ching-Hsien Hsu; Xiaoming Li; Xuanhua Shi; Ran Zheng. 10th International Conference on Network and Parallel Computing (NPC), Sep 2013, Guiyang, China. Springer, Lecture Notes in Computer Science, LNCS-8147, pp.39-48, 2013, Network and Parallel Computing. 〈10.1007/978-3-642-40820-5_4〉
https://hal.inria.fr/hal-01513757
Contributeur : Hal Ifip
<>
Soumis le : mardi 25 avril 2017 - 14:33:24
Dernière modification le : mardi 25 avril 2017 - 14:35:51
Document(s) archivé(s) le : mercredi 26 juillet 2017 - 13:56:15
Kai Zhang, Shuming Chen, Wei Liu, Xi Ning. A Fine-Grained Pipelined Implementation of LU Decomposition on SIMD Processors. Ching-Hsien Hsu; Xiaoming Li; Xuanhua Shi; Ran Zheng. 10th International Conference on Network and Parallel Computing (NPC), Sep 2013, Guiyang, China. Springer, Lecture Notes in Computer Science, LNCS-8147, pp.39-48, 2013, Network and Parallel Computing. 〈10.1007/978-3-642-40820-5_4〉. 〈hal-01513757〉