V. Volkov and J. Demmel, Benchmarking GPUs to tune dense linear algebra, 2008 SC, International Conference for High Performance Computing, Networking, Storage and Analysis, p.31, 2008.
DOI : 10.1109/SC.2008.5214359

S. Williams, L. Oliker, R. W. Vuduc, J. Shalf, K. A. Yelick et al., Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms, SC'07: Proceedings of the ACM/IEEE Conference on High Performance Networking and Computing, 2007.

N. Bell and M. Garland, Implementing sparse matrix-vector multiplication on throughput-oriented processors, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, pp.1-11, 2009.
DOI : 10.1145/1654059.1654078

M. Baskaran and R. Bordawekar, Optimizing Sparse Matrix-Vector Multiplication on GPUs, 2009.

L. Buatois, G. Caumon, and B. Levy, Concurrent number cruncher: a GPU implementation of a general sparse linear solver, International Journal of Parallel, Emergent and Distributed Systems, vol.49, issue.3, pp.205-223, 2009.
DOI : 10.1016/0010-4485(92)90054-E
URL : https://hal.archives-ouvertes.fr/inria-00331906

S. G. Petiton and C. Weill-duflo, Massively parallel preconditioners for the sparse conjugate gradient method, Parallel Processing: CONPAR 92 -VAPP V, Second Joint International Conference on Vector and Parallel Processing, pp.373-378, 1992.
DOI : 10.1007/3-540-55895-0_433