The LINPACK Benchmark: past, present and future, Concurrency and Computation: Practice and Experience, vol.38, issue.9, pp.803-820, 2003. ,
DOI : 10.1002/cpe.728
Parallel and cacheefficient in-place matrix storage format conversion, pp.5-6, 2010. ,
Recursion leads to automatic variable blocking for dense linear-algebra algorithms, IBM Journal of Research and Development, vol.41, issue.6, pp.737-756, 1997. ,
DOI : 10.1147/rd.416.0737
Scaling LAPACK panel operations using parallel cache assignment, ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP'10, pp.10-1145, 2010. ,
DOI : 10.1145/1693453.1693484
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.153.8762
Autotuning GEMMs for Fermi ,
DOI : 10.1109/tpds.2011.311
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.421.5106
Automated empirical optimizations of software and the ATLAS project, Parallel Computing, vol.27, issue.1-2, pp.3-35, 2001. ,
DOI : 10.1016/S0167-8191(00)00087-9