J. Kurzak, S. Tomov, and J. Dongarra, Autotuning gemm kernels for the fermi gpu. Parallel and Distributed Systems, IEEE Transactions on, issue.991, 2012.

M. D. Lam, E. E. Rothberg, and M. E. Wolf, The cache performance and optimizations of blocked algorithms, ACM SIGPLAN Notices, vol.26, issue.4, pp.63-74, 1991.
DOI : 10.1145/106973.106981

A. C. Mckellar, E. G. Coffman, and J. , Organizing matrices and matrix operations for paged memory systems, Communications of the ACM, vol.12, issue.3, pp.153-165, 1969.
DOI : 10.1145/362875.362879

R. Nath, S. Tomov, and J. Dongarra, An improved magma gemm for fermi gpus, 2010.
DOI : 10.1177/1094342010385729