P. Bhat, C. Raghavendra, and V. Prasanna, Efficient collective communication in distributed heterogeneous systems, ICDCS'99 19th International Conference on Distributed Computing Systems, pp.15-24, 1999.

P. Bhat, C. Raghavendra, and V. Prasanna, Efficient collective communication in distributed heterogeneous systems, Journal of Parallel and Distributed Computing, vol.63, pp.251-263, 2003.

L. S. Blackford, J. Choi, A. Cleary, E. D'azevedo, J. Demmel et al., ScaLAPACK Users' Guide. SIAM, 1997.

L. E. Cannon, A cellular computer to implement the Kalman filter algorithm, 1969.

J. Dongarra, J. Pineau, Y. Robert, Z. Shi, and F. Vivien, Revisiting matrix product on master-worker platforms, 2006.
URL : https://hal.archives-ouvertes.fr/hal-00803474

D. Ironya, S. Toledo, and A. Tiskin, Communication lower bounds for distributed-memory matrix multiplication, Journal of Parallel and Distributed Computing, vol.64, pp.1017-1026, 2004.

T. Saif and M. Parashar, Understanding the behavior and performance of non-blocking communications in MPI, Proceedings of Euro-Par 2004: Parallel Processing, vol.3149, pp.173-182, 2004.

S. Toledo, A survey of out-of-core algorithms in numerical linear algebra, External Memory Algorithms and Visualization, pp.161-180, 1999.

R. C. Whaley and J. Dongarra, Automatically tuned linear algebra software, Proceedings of the ACM/IEEE Symposium on Supercomputing (SC'98), 1998.