L. E. Cannon, A cellular computer to implement the kalman filter algorithm, 1969.

S. Toledo, A survey of out-of-core algorithms in numerical linear algebra, in: External Memory Algorithms and Visualization, pp.161-180, 1999.

J. Pineau, Y. Robert, F. Vivien, and J. Dongarra, Matrix product on heterogeneous masterworker platforms, PPoPP'2008, the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp.53-62, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00803487

M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran, Cache-oblivious algorithms, FOCS'99, the 40th IEEE Symposium on Foundations of Computer Science, pp.285-298, 1999.

D. Ironya, S. Toledo, and A. Tiskin, Communication lower bounds for distributed-memory matrix multiplication, Journal of Parallel and Distributed Computing, vol.64, issue.9, pp.1017-1026, 2004.
DOI : 10.1016/j.jpdc.2004.03.021

F. Broquedis, J. C. Ortega, S. Moreaud, N. Furmento, B. Goglin et al., hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing
DOI : 10.1109/PDP.2010.67

URL : https://hal.archives-ouvertes.fr/inria-00429889

T. Rolf, Cache organization and memory management of the Intel Nehalem computer architecture, 2009.

J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer et al., Introduction to the Cell multiprocessor, IBM Journal of Research and Development, vol.49, issue.4.5, pp.4-5, 2005.
DOI : 10.1147/rd.494.0589

C. Guide, URL http, NVIDIA, 2010.