LAPACK: A portable linear algebra library for high-performance computers, Proceedings SUPERCOMPUTING '90, pp.2-11, 1990. ,
DOI : 10.1109/SUPERC.1990.129995
Static LU Decomposition on Heterogeneous Platforms, International Journal of High Performance Computing Applications, vol.15, issue.3, pp.310-323, 2001. ,
DOI : 10.1177/109434200101500308
URL : https://hal.archives-ouvertes.fr/hal-00856641
ScaLA- PACK user's guide, 1997. ,
Algorithms for LU decomposition on a shared memory multiprocessor, Parallel Computing, vol.19, issue.8, pp.925-937, 1993. ,
DOI : 10.1016/0167-8191(93)90075-V
Prefetching irregular references for software cache on cell, Proceedings of the sixth annual IEEE/ACM international symposium on Code generation and optimization , CGO '08, pp.155-164, 2008. ,
DOI : 10.1145/1356058.1356079
Using advanced compiler technology to exploit the performance of the Cell Broadband Engine??? architecture, IBM Systems Journal, vol.45, issue.1, pp.59-84, 2006. ,
DOI : 10.1147/sj.451.0059
The design and implementation of a first-generation cell processor Cell be programming tutorial, Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC), 2005. 8. IBM 9. IBM. Cell broadband engine sdk libraries v3.0, 2008. ,
Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors?, Proceedings of the International Conference on Compiler Construction, 2010. ,
DOI : 10.1007/978-3-642-11970-5_15
Introduction to the Cell multiprocessor, IBM Journal of Research and Development, vol.49, issue.4.5, pp.589-604, 2005. ,
DOI : 10.1147/rd.494.0589
Automatic blocking of QR and LU factorizations for locality, Proceedings of the 2004 workshop on Memory system performance , MSP '04, pp.12-22, 2004. ,
DOI : 10.1145/1065895.1065898
Does cache sharing on modern cmp matter to the performance of contemporary multithreaded programs?, PPoPP '10: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp.203-212, 2010. ,