Skip to Main content Skip to Navigation
Preprints, Working Papers, ...

Parallel Tensor Train through Hierarchical Decomposition

Laura Grigori 1 Suraj Kumar 1
1 ALPINES - Algorithms and parallel tools for integrated numerical simulations
INSMI - Institut National des Sciences Mathématiques et de leurs Interactions, Inria de Paris, LJLL (UMR_7598) - Laboratoire Jacques-Louis Lions
Abstract : We consider the problem of developing parallel decomposition and approximation algorithms for high dimensional tensors. We focus on a tensor representation named Tensor Train (TT). It stores a d-dimensional tensor in O(ndr^2), much less than the O(n^d) entries in the original tensor, where 'r' is usually a very small number and depends on the application. Sequential algorithms to compute TT decomposition and TT approximation of a tensor have been proposed in the literature. Here we propose a parallel algorithm to compute TT decomposition of a tensor. We prove that the ranks of TT-representation produced by our algorithm are bounded by the ranks of unfolding matrices of the tensor. Additionally, we propose a parallel algorithm to compute approximation of a tensor in TT-representation. Our algorithm relies on a hierarchical partitioning of the dimensions of the tensor in a balanced binary tree shape and transmission of leading singular values of associated unfolding matrix from the parent to its children. We consider several approaches on the basis of how leading singular values are transmitted in the tree. We present an in-depth experimental analysis of our approaches for different low rank tensors and also assess them for tensors obtained from quantum chemistry simulations. Our results show that the approach which transmits leading singular values to both of its children performs better in practice. Compression ratios and accuracies of the approximations obtained by our approaches are comparable with the sequential algorithm and, in some cases, even better than that. We also show that our algorithms transmit only O(log^2(P)log(d)) number of messages along the critical path for a d-dimensional tensor on P processors. The lower bound on the number of messages for any algorithm which exchanges data on P processors is log(P), and our algorithms achieve this bound, modulo polylog factor.
Document type :
Preprints, Working Papers, ...
Complete list of metadata
Contributor : Suraj Kumar Connect in order to contact the contributor
Submitted on : Friday, February 19, 2021 - 2:40:09 PM
Last modification on : Friday, January 21, 2022 - 3:17:59 AM


Files produced by the author(s)


  • HAL Id : hal-03081555, version 3


Laura Grigori, Suraj Kumar. Parallel Tensor Train through Hierarchical Decomposition. 2021. ⟨hal-03081555v3⟩



Les métriques sont temporairement indisponibles