B. W. Bader and T. G. Kolda, Algorithm 862: MATLAB tensor classes for fast algorithm prototyping, ACM Transactions on Mathematical Software, vol.32, issue.4, pp.635-653, 2006.

B. W. Bader and T. G. Kolda, Matlab tensor toolbox version 2, vol.6, p.525

G. Ballard, N. Knight, and K. Rouse, Communication lower bounds for matricized tensor times Khatri-Rao product, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp.530-557, 2018.

T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms, 2009.

L. De-lathauwer, P. Comon, B. De-moor, and J. Vandewalle, Higher-order power method-Application in independent component analysis, Pro-535 ceedings NOLTA'95, pp.91-96, 1995.

L. De-lathauwer, B. De-moor, and J. Vandewalle, R N ) approximation of higher-order tensors, SIAM Journal on Matrix Analysis and Applications, vol.21, issue.1, pp.1324-1342, 2000.

J. J. Dongarra, J. Croz, S. Hammarling, and R. J. Hanson, An extended set 540 of FORTRAN basic linear algebra subprograms, ACM Trans. Math. Softw, vol.14, issue.1, pp.1-17, 1988.

J. J. Dongarra, J. Croz, S. Hammarling, and R. J. Hanson, Algorithm 656: An extended set of basic linear algebra subprograms: Model implementation and test programs, ACM Trans. Math. Softw, vol.14, issue.1, pp.18-32, 1988.

M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran, Cache-oblivious algorithms, Foundations of Computer Science, 1999. 40th Annual Symposium on, pp.285-297, 1999.

K. Hayashi, G. Ballard, Y. Jiang, and M. J. Tobia, Shared-memory parallelization of MTTKRP for dense tensors, Proceedings of the 23rd ACM SIG-550 PLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '18, pp.393-394, 2018.

A. Heinecke, G. Henry, M. Hutchinson, and H. Pabst, LIBXSMM: Accelerating small matrix multiplications by runtime code generation, Proceedings of the International Conference for High Performance Computing, p.555

S. Ing, . Analysis, and . Sc-', , vol.16, p.11, 2016.

, Intel math kernel library reference manual, pp.30-31

O. Kaya and B. Uçar, Parallel Candecomp/Parafac decomposition of sparse tensors using dimension trees, SIAM Journal on Scientific Computing, vol.40, issue.1, pp.99-130, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01397464

F. Kjolstad, S. Kamil, S. Chou, D. Lugato, and S. Amarasinghe, The Tensor Algebra Compiler, Proc. ACM Program. Lang, vol.1, issue.OOPSLA, p.29, 2017.

T. G. Kolda and B. W. Bader, Tensor decompositions and applications, SIAM Review, vol.51, issue.3, pp.455-500, 2009.

J. Li, C. Battaglino, I. Perros, J. Sun, and R. Vuduc, An input-adaptive and in-place approach to dense tensor-times-matrix multiply, High 570 Performance Computing, Networking, Storage and Analysis, 2015 SCInternational Conference for, vol.76, p.12, 2015.

J. Li, J. Sun, and R. Vuduc, HiCOO: Hierarchical storage of sparse tensors, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC'18, p.575

K. P. Lorton and D. S. Wise, Analyzing block locality in Morton-order and morton-hybrid matrices, SIGARCH Comput. Archit. News, vol.35, issue.4, pp.6-12, 2007.

D. Matthews, High-performance tensor contraction without transposition, 580 SIAM Journal on Scientific Computing, vol.40, issue.1, pp.1-24, 2018.

G. M. Morton, A computer oriented geodetic data base and a new technique in file sequencing

A. H. Phan, P. Tichavský, and A. Cichocki, Fast alternating LS algorithms for high order CANDECOMP/PARAFAC tensor factorizations, IEEE Trans-585 actions on Signal Processing, vol.61, issue.19, pp.4834-4846, 2013.

S. Smith and G. Karypis, Tensor-matrix products with a compressed sparse tensor, Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms, IA3 '15, vol.5, pp.1-5, 2015.

E. Solomonik, D. Matthews, J. R. Hammond, J. F. Stanton, and J. Demmel, A massively parallel tensor contraction framework for coupled-cluster computations, Journal of Parallel and Distributed Computing, vol.74, issue.12, pp.3176-3190, 2014.

P. Springer and P. Bientinesi, Design of a high-performance Gemm-like tensor-595 tensor multiplication, ACM Transactions on Mathematical Software, vol.44, issue.3, p.29, 2018.

F. G. Van-zee and R. A. Van-de-geijn, BLIS: A framework for rapidly instantiating BLAS functionality, ACM Transactions on Mathematical Software, vol.41, issue.3, p.33, 2015.

D. W. Walker, Morton ordering of 2D arrays for efficient access to hierarchical memory, The International Journal of High Performance Computing Applications, vol.32, issue.1, pp.189-203, 2018.

R. C. Whaley and J. J. Dongarra, Automatically tuned linear algebra software, in: Supercomputing, 1998. SC98, IEEE/ACM Conference on, pp.605-643, 1998.

Z. Xianyi, W. Qian, Z. Chothia, and O. , , pp.30-31, 2014.

A. N. Yzelman and R. H. Bisseling, A cache-oblivious sparse matrix-vector multiplication scheme based on the Hilbert curve, p.610

, Progress in Industrial Mathematics at ECMI 2010, pp.627-633, 2012.