J. J. Dongarra, P. Luszczek, and A. Petitet, The LINPACK Benchmark: past, present and future, Concurrency and Computation: Practice and Experience, vol.38, issue.9, pp.803-820, 2003.
DOI : 10.1002/cpe.728

F. G. Gustavson, L. Karlsson, and B. Kågström, Parallel and cacheefficient in-place matrix storage format conversion, pp.5-6, 2010.

F. G. Gustavson, Recursion leads to automatic variable blocking for dense linear-algebra algorithms, IBM Journal of Research and Development, vol.41, issue.6, pp.737-756, 1997.
DOI : 10.1147/rd.416.0737

A. M. Castaldo and R. C. Whaley, Scaling LAPACK panel operations using parallel cache assignment, ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP'10, pp.10-1145, 2010.
DOI : 10.1145/1693453.1693484
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.153.8762

J. Kurzak, S. Tomov, and J. Dongarra, Autotuning GEMMs for Fermi
DOI : 10.1109/tpds.2011.311
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.421.5106

R. Whaley, A. Petitet, and J. Dongarra, Automated empirical optimizations of software and the ATLAS project, Parallel Computing, vol.27, issue.1-2, pp.3-35, 2001.
DOI : 10.1016/S0167-8191(00)00087-9