LU Decomposition on Cell Broadband Engine: An Empirical Study to Exploit Heterogeneous Chip Multiprocessors

Abstract : To meet the needs of high performance computing, the Cell Broadband Engine owns many features that differ from traditional processors, such as the large number of synergistic processor elements, large register files, the ability to hide main-storage latency with concurrent computation and DMA transfers. The exploitation of those features requires the programmer to carefully tailor programs and simutaneously deal with various performance factors, including locality, load balance, communication overhead, and multi-level parallelism. These factors, unfortunately, are dependent on each other; an optimization that enhances one factor may degrade another. This paper presents our experience on optimizing LU decomposition, one of the commonly used algebra kernels in scientific computing, on Cell Broadband Engine. The optimizations exploit task-level, data-level, and communication-level parallelism. We study the effects of different task distribution strategies, prefetch, and software cache, and explore the tradeoff among different performance factors, stressing the interactions between different optimizations. This work offers some insights in the optimizations on heterogenous multi-core processors, including the selection of programming models, considerations in task distribution, and the holistic perspective required in optimizations.
Type de document :
Communication dans un congrès
Chen Ding; Zhiyuan Shao; Ran Zheng. IFIP International Conference on Network and Parallel Computing (NPC), Sep 2010, Zhengzhou, China. Springer, Lecture Notes in Computer Science, LNCS-6289, pp.61-75, 2010, Network and Parallel Computing. 〈10.1007/978-3-642-15672-4_7〉
Liste complète des métadonnées

Littérature citée [11 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01054956
Contributeur : Hal Ifip <>
Soumis le : lundi 11 août 2014 - 09:55:51
Dernière modification le : vendredi 11 août 2017 - 17:44:14
Document(s) archivé(s) le : jeudi 27 novembre 2014 - 10:52:39

Fichier

main.pdf
Fichiers produits par l'(les) auteur(s)

Licence


Distributed under a Creative Commons Paternité 4.0 International License

Identifiants

Citation

Feng Mao, Xipeng Shen. LU Decomposition on Cell Broadband Engine: An Empirical Study to Exploit Heterogeneous Chip Multiprocessors. Chen Ding; Zhiyuan Shao; Ran Zheng. IFIP International Conference on Network and Parallel Computing (NPC), Sep 2010, Zhengzhou, China. Springer, Lecture Notes in Computer Science, LNCS-6289, pp.61-75, 2010, Network and Parallel Computing. 〈10.1007/978-3-642-15672-4_7〉. 〈hal-01054956〉

Partager

Métriques

Consultations de la notice

87

Téléchargements de fichiers

65