E. Angerson, Z. Bai, J. Dongarra, A. Greenbaum, A. Mckenney et al., LAPACK: A portable linear algebra library for high-performance computers, Proceedings SUPERCOMPUTING '90, pp.2-11, 1990.
DOI : 10.1109/SUPERC.1990.129995

O. Beaumont, A. Legrand, F. Rastello, and Y. Robert, Static LU Decomposition on Heterogeneous Platforms, International Journal of High Performance Computing Applications, vol.15, issue.3, pp.310-323, 2001.
DOI : 10.1177/109434200101500308

URL : https://hal.archives-ouvertes.fr/hal-00856641

L. S. Blackford, J. Choi, A. Cleary, E. D. 'azeuedo, J. Demmel et al., ScaLA- PACK user's guide, 1997.

J. J. Buoni, P. A. Farrell, and A. Ruttan, Algorithms for LU decomposition on a shared memory multiprocessor, Parallel Computing, vol.19, issue.8, pp.925-937, 1993.
DOI : 10.1016/0167-8191(93)90075-V

T. Chen, T. Zhang, Z. Sura, and M. G. Tallada, Prefetching irregular references for software cache on cell, Proceedings of the sixth annual IEEE/ACM international symposium on Code generation and optimization , CGO '08, pp.155-164, 2008.
DOI : 10.1145/1356058.1356079

A. E. Eichenberger, Using advanced compiler technology to exploit the performance of the Cell Broadband Engine??? architecture, IBM Systems Journal, vol.45, issue.1, pp.59-84, 2006.
DOI : 10.1147/sj.451.0059

D. Pham, The design and implementation of a first-generation cell processor Cell be programming tutorial, Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC), 2005. 8. IBM 9. IBM. Cell broadband engine sdk libraries v3.0, 2008.

Y. Jiang, E. Zhang, K. Tian, and X. Shen, Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors?, Proceedings of the International Conference on Compiler Construction, 2010.
DOI : 10.1007/978-3-642-11970-5_15

J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer et al., Introduction to the Cell multiprocessor, IBM Journal of Research and Development, vol.49, issue.4.5, pp.589-604, 2005.
DOI : 10.1147/rd.494.0589

Q. Yi, K. Kennedy, H. You, K. Seymour, and J. Dongarra, Automatic blocking of QR and LU factorizations for locality, Proceedings of the 2004 workshop on Memory system performance , MSP '04, pp.12-22, 2004.
DOI : 10.1145/1065895.1065898

E. Z. Zhang, Y. Jiang, and X. Shen, Does cache sharing on modern cmp matter to the performance of contemporary multithreaded programs?, PPoPP '10: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp.203-212, 2010.