Autotuning gemm kernels for the fermi gpu. Parallel and Distributed Systems, IEEE Transactions on, issue.991, 2012. ,
The cache performance and optimizations of blocked algorithms, ACM SIGPLAN Notices, vol.26, issue.4, pp.63-74, 1991. ,
DOI : 10.1145/106973.106981
Organizing matrices and matrix operations for paged memory systems, Communications of the ACM, vol.12, issue.3, pp.153-165, 1969. ,
DOI : 10.1145/362875.362879
An improved magma gemm for fermi gpus, 2010. ,
DOI : 10.1177/1094342010385729