A Fine-Grained Pipelined Implementation of LU Decomposition on SIMD Processors

Abstract : The LU decomposition is a widely used method to solve the dense linear algebra in many scientific computation applications. In recent years, the single instruction multiple data (SIMD) technology has been a popular method to accelerate the LU decomposition. However, the pipeline parallelism and memory bandwidth utilization are low when the LU decomposition mapped onto SIMD processors. This paper proposes a fine-grained pipelined implementation of LU decomposition on SIMD processors. The fine-grained algorithm well utilizes data dependences of the native algorithm to explore the fine-grained parallelism among all the computation resources. By transforming the non-coalesced memory access to coalesced version, the proposed algorithm can achieve the high pipeline parallelism and the high efficient memory access. Experimental results show that the proposed technology can achieve a speedup of 1.04x to 1.82x over the native algorithm and can achieve about 89% of the peak performance on the SIMD processor.
Type de document :
Communication dans un congrès
Ching-Hsien Hsu; Xiaoming Li; Xuanhua Shi; Ran Zheng. 10th International Conference on Network and Parallel Computing (NPC), Sep 2013, Guiyang, China. Springer, Lecture Notes in Computer Science, LNCS-8147, pp.39-48, 2013, Network and Parallel Computing. 〈10.1007/978-3-642-40820-5_4〉
Liste complète des métadonnées

Littérature citée [19 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01513757
Contributeur : Hal Ifip <>
Soumis le : mardi 25 avril 2017 - 14:33:24
Dernière modification le : mardi 25 avril 2017 - 14:35:51
Document(s) archivé(s) le : mercredi 26 juillet 2017 - 13:56:15

Fichier

978-3-642-40820-5_4_Chapter.pd...
Fichiers produits par l'(les) auteur(s)

Licence


Distributed under a Creative Commons Paternité 4.0 International License

Identifiants

Citation

Kai Zhang, Shuming Chen, Wei Liu, Xi Ning. A Fine-Grained Pipelined Implementation of LU Decomposition on SIMD Processors. Ching-Hsien Hsu; Xiaoming Li; Xuanhua Shi; Ran Zheng. 10th International Conference on Network and Parallel Computing (NPC), Sep 2013, Guiyang, China. Springer, Lecture Notes in Computer Science, LNCS-8147, pp.39-48, 2013, Network and Parallel Computing. 〈10.1007/978-3-642-40820-5_4〉. 〈hal-01513757〉

Partager

Métriques

Consultations de la notice

49

Téléchargements de fichiers

140