Optimization of Triangular Matrix Functions in BLAS Library on Loongson2F

Abstract : BLAS (Basic Linear Algebra Subprograms) plays a very important role in scientific computing and engineering applications. ATLAS is often recommended as a way to generate an optimized BLAS library. Based on ATLAS, this paper optimizes the algorithms of triangular matrix functions on 750 MHZ Loongson 2F processor-specific architecture. Using loop unrolling, instruction scheduling and data pre-fetching techniques, computing time and memory access delay are both reduced, and thus the performance of functions is improved. Experimental results indicate that these optimization techniques can effectively reduce the running time of functions. After optimization, double-precision type function of TRSM has the speed of 1300Mflops, while single-precision type function has the speed of 1800Mflops. Compared with ATLAS, the performance of function TRSM is improved by 50% to 60%, even by 100% to 200% under small-scale input.
Type de document :
Communication dans un congrès
Chen Ding; Zhiyuan Shao; Ran Zheng. IFIP International Conference on Network and Parallel Computing (NPC), Sep 2010, Zhengzhou, China. Springer, Lecture Notes in Computer Science, LNCS-6289, pp.35-45, 2010, Network and Parallel Computing. 〈10.1007/978-3-642-15672-4_5〉
Liste complète des métadonnées

Littérature citée [10 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01054958
Contributeur : Hal Ifip <>
Soumis le : lundi 11 août 2014 - 09:51:00
Dernière modification le : vendredi 11 août 2017 - 17:44:24
Document(s) archivé(s) le : jeudi 27 novembre 2014 - 10:56:33

Fichier

llncs.pdf
Fichiers produits par l'(les) auteur(s)

Licence


Distributed under a Creative Commons Paternité 4.0 International License

Identifiants

Citation

Yun Xu, Mingzhi Shao, Da Teng. Optimization of Triangular Matrix Functions in BLAS Library on Loongson2F. Chen Ding; Zhiyuan Shao; Ran Zheng. IFIP International Conference on Network and Parallel Computing (NPC), Sep 2010, Zhengzhou, China. Springer, Lecture Notes in Computer Science, LNCS-6289, pp.35-45, 2010, Network and Parallel Computing. 〈10.1007/978-3-642-15672-4_5〉. 〈hal-01054958〉

Partager

Métriques

Consultations de la notice

47

Téléchargements de fichiers

135