Parallel Tensor Train through Hierarchical Decomposition - Archive ouverte HAL Access content directly
Preprints, Working Papers, ... Year :

Parallel Tensor Train through Hierarchical Decomposition

(1) , (1)
1

Abstract

We consider the problem of developing parallel decomposition and approximation algorithms for high dimensional tensors. We focus on a tensor representation named Tensor Train (TT). It stores a d-dimensional tensor in O(ndr^2), much less than the O(n^d) entries in the original tensor, where 'r' is usually a very small number and depends on the application. Sequential algorithms to compute TT decomposition and TT approximation of a tensor have been proposed in the literature. Here we propose a parallel algorithm to compute TT decomposition of a tensor. We prove that the ranks of TT-representation produced by our algorithm are bounded by the ranks of unfolding matrices of the tensor. Additionally, we propose a parallel algorithm to compute approximation of a tensor in TT-representation. Our algorithm relies on a hierarchical partitioning of the dimensions of the tensor in a balanced binary tree shape and transmission of leading singular values of associated unfolding matrix from the parent to its children. We consider several approaches on the basis of how leading singular values are transmitted in the tree. We present an in-depth experimental analysis of our approaches for different low rank tensors and also assess them for tensors obtained from quantum chemistry simulations. Our results show that the approach which transmits leading singular values to both of its children performs better in practice. Compression ratios and accuracies of the approximations obtained by our approaches are comparable with the sequential algorithm and, in some cases, even better than that. We also show that our algorithms transmit only O(log^2(P)log(d)) number of messages along the critical path for a d-dimensional tensor on P processors. The lower bound on the number of messages for any algorithm which exchanges data on P processors is log(P), and our algorithms achieve this bound, modulo polylog factor.
Fichier principal
Vignette du fichier
paralleltt.pdf (495.07 Ko) Télécharger le fichier
Origin : Files produced by the author(s)

Dates and versions

hal-03081555 , version 1 (18-12-2020)
hal-03081555 , version 2 (18-12-2020)
hal-03081555 , version 3 (19-02-2021)

Identifiers

  • HAL Id : hal-03081555 , version 3

Cite

Laura Grigori, Suraj Kumar. Parallel Tensor Train through Hierarchical Decomposition. 2021. ⟨hal-03081555v3⟩
643 View
489 Download

Share

Gmail Facebook Twitter LinkedIn More