High-performance parallel algorithms for the Tucker decomposition of higher order sparse tensors

Oguz Kaya; Bora Uçar

Rapport (Rapport De Recherche) Année : 2015

High-performance parallel algorithms for the Tucker decomposition of higher order sparse tensors

(1, 2) , (2, 1)

1
2

Oguz Kaya

Fonction : Auteur
PersonId : 773991
ORCID : 0000-0002-4444-1516
IdRef : 204228441

Laboratoire de l'Informatique du Parallélisme

Optimisation des ressources : modèles, algorithmes et ordonnancement

Bora Uçar

Fonction : Auteur
PersonId : 177627
IdHAL : bora-ucar
ORCID : 0000-0002-4960-3545
IdRef : 185655491

Optimisation des ressources : modèles, algorithmes et ordonnancement

Laboratoire de l'Informatique du Parallélisme

Résumé

We investigate an efficient parallelization of a class of algorithms for the well-known Tucker decomposition of general $N$-dimensional sparse tensors. The targeted algorithms are iterative and use the alternating least squares method. At each iteration, for each dimension of an $N$-dimensional input tensor, the following operations are performed: (i) the tensor is multiplied with $(N - 1)$ matrices (TTM step); (ii) the product is then converted to a matrix; and (iii) a few leading left singular vectors of the resulting matrix are computed (SVD step) to update one of the matrices for the next TTM step. We propose an efficient parallelization of these algorithms for current supercomputers comprised of compute nodes, where each node is a multi-core system. We reformulate the computation of $N$ successive TTM-steps to increase the reuse of intermediate computation, which is of interest on its own. We discuss a set of preprocessing steps which takes all computational decisions out of the main iteration of the algorithm and provide an intuitive row-wise shared-memory parallelism for the TTM and SVD steps. We consider a coarse and a fine grain computational scheme, investigate their data dependencies, and identify efficient communication schemes. We demonstrate how the computation of singular vectors in the SVD step can be carried out efficiently following the TTM step. Finally, we develop a hybrid MPI-OpenMP based implementation of the overall algorithm and report speedup results on up to 2048 cores.

Domaines

Informatique [cs] Calcul parallèle, distribué et partagé [cs.DC]

Fichier principal

RR-8801.pdf (794.1 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Equipe Roma : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01219316

Soumis le : jeudi 22 octobre 2015-14:22:27

Dernière modification le : mardi 16 janvier 2024-16:29:31

Archivage à long terme le : vendredi 5 mai 2017-12:52:39

Dates et versions

hal-01219316 , version 1 (22-10-2015)

Licence

Identifiants

HAL Id : hal-01219316 , version 1

Citer

Oguz Kaya, Bora Uçar. High-performance parallel algorithms for the Tucker decomposition of higher order sparse tensors. [Research Report] RR-8801, Inria - Research Centre Grenoble – Rhône-Alpes. 2015. ⟨hal-01219316⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-LYON CNRS INRIA UNIV-LYON1 INRIA-RRRT INRIA2 GENCI LARA UDL

385 Consultations

1610 Téléchargements

High-performance parallel algorithms for the Tucker decomposition of higher order sparse tensors

Résumé

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Partager