J. O. Eklundh, A fast computer method for matrix transposing, IEEE Transactions on Computers, vol.21, issue.7, pp.801-803, 1972.

S. D. Kaushik, C. Huang, R. W. Johnson, P. Sadayappan, and J. R. Johnson, Efficient transposition algorithms for large matrices, Proceedings of the 1993 ACM/IEEE Conference on Supercomputing, pp.656-665, 1993.

S. Krishnamoorthy, G. Baumgartner, D. Cociorva, C. Lam, and P. Sadayappan, Efficient parallel out-of-core matrix transposition, International Journal on High Performance Computing and Networking, vol.2, issue.4, pp.110-119, 2006.

A. Zekri, Restructuring and implementations of 2d matrix transpose algorithm using sse4 vector instructions, 2015 International Conference on Applied Research in Computer Science and Engineering (ICAR), pp.1-7, 2015.

J. Suh and V. K. Prasanna, An efficient algorithm for out-ofcore matrix transposition, IEEE Transactions on Computers, vol.51, issue.4, pp.420-438, 2002.

S. Krishnamoorthy, G. Baumgartner, D. Cociorva, C. Lam, and P. Sadayappan, Efficient parallel out-of-core matrix transposition, 2003 Proceedings IEEE International Conference on Cluster Computing, pp.300-307, 2003.

F. Gustavson, L. Karlsson, and B. Kågström, Parallel and cacheefficient in-place matrix storage format conversion, ACM Trans. Math. Softw, vol.38, issue.3, pp.1-17, 2012.

D. P. Bovet and M. Cesati, Understanding the Linux Kernel, 3rd Edition: from I/O ports to process management, 2005.

A. Aggarwal and J. S. Vitter, The input/output complexity of sorting and related problems, Communications of the ACM, vol.31, issue.9, pp.1116-1127, 1988.
URL : https://hal.archives-ouvertes.fr/inria-00075827

M. Shao, S. Schlosser, S. Papadomanolakis, J. Schindler, A. Ailamaki et al., MultiMap: Preserving disk locality for multidimensional datasets, International Conference on Data Engineering, pp.926-935, 2007.

R. Thonangi and J. Yang, Permuting data on random-access block storage, Proceedings of the VLDB Endowment, vol.6, pp.721-732, 2013.