B. W. Bader and T. G. Kolda, Algorithm 862: MATLAB tensor classes for fast algorithm prototyping, ACM TOMS, vol.32, issue.4, pp.635-653, 2006.

G. Ballard, N. Knight, and K. Rouse, Communication lower bounds for matricized tensor times Khatri-Rao product, IPDPS, pp.557-567, 2018.

A. Y. Grama, A. Gupta, and V. Kumar, Isoefficiency: Measuring the scalability of parallel algorithms and architectures, IEEE Parallel & Distributed Technology: Systems & Applications, vol.1, issue.3, pp.12-21, 1993.

F. Kjolstad, S. Kamil, S. Chou, D. Lugato, and S. Amarasinghe, The Tensor Algebra Compiler, Proc. ACM Program. Lang. 1(OOPSLA), vol.77, p.29, 2017.

T. G. Kolda and B. W. Bader, Tensor decompositions and applications, SIAM Review, vol.51, issue.3, pp.455-500, 2009.

J. Li, C. Battaglino, I. Perros, J. Sun, and R. Vuduc, An input-adaptive and in-place approach to dense tensor-times-matrix multiply, SC'15, vol.76, p.12, 2015.

D. Matthews, High-performance tensor contraction without transposition, SIAM Journal on Scientific Computing, vol.40, issue.1, pp.1-24, 2018.

G. M. Morton, A computer oriented geodetic data base and a new technique in file sequencing, 1966.

F. Paw-lowski, B. Uçar, and A. J. Yzelman, High performance tensor-vector multiples on shared memory systems, Tech. Rep, vol.9274, 2019.

F. Pawlowski, B. Uçar, and A. N. Yzelman, A multi-dimensional Morton-ordered block storage for mode-oblivious tensor computations, Journal of Computational Science, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02082524

E. Solomonik, D. Matthews, J. R. Hammond, J. F. Stanton, and J. Demmel, A massively parallel tensor contraction framework for coupled-cluster computations, Journal of Parallel and Distributed Computing, vol.74, issue.12, pp.3176-3190, 2014.

P. Springer and P. Bientinesi, Design of a high-performance gemm-like tensor-tensor multiplication, ACM TOMS, vol.44, issue.3, p.29, 2018.