Partitioning and Communication Strategies for Sparse Non-negative Matrix Factorization

Abstract : Non-negative matrix factorization (NMF), the problem of finding two non-negative low-rank factors whose product approximates an input matrix, is a useful tool for many data mining and scientific applications such as topic modeling in text mining and blind source separation in microscopy. In this paper, we focus on scaling algorithms for NMF to very large sparse datasets and massively parallel machines by employing effective algorithms, communication patterns, and partitioning schemes that leverage the sparsity of the input matrix. In the case of machine learning workflow, the computations after SpMM must deal with dense matrices, as Sparse-Dense matrix multiplication will result in a dense matrix. Hence, the partitioning strategy considering only SpMM will result in a huge imbalance in the overall workflow especially on computations after SpMM and in this specific case of NMF on non-negative least squares computations. Towards this, we consider two previous works developed for related problems, one that uses a fine-grained partitioning strategy using a point-to-point communication pattern and on that uses a checkerboard partitioning strategy using a collective-based communication pattern. We show that a combination of the previous approaches balances the demands of the various computations within NMF algorithms and achieves high efficiency and scalability. From the experiments, we could see that our proposed algorithm communicates atleast 4x less than the collective and achieves upto 100x speed up over the baseline FAUN on real world datasets. Our algorithm was experimented in two different super computing platforms and we could scale up to 32000 processors on Bluegene/Q.
Complete list of metadatas

Cited literature [37 references]  Display  Hide  Download

https://hal.inria.fr/hal-01849084
Contributor : Oguz Kaya <>
Submitted on : Wednesday, July 25, 2018 - 4:15:27 PM
Last modification on : Wednesday, August 21, 2019 - 11:02:06 AM
Long-term archiving on: Friday, October 26, 2018 - 2:33:14 PM

File

RR-9198.pdf
Publisher files allowed on an open archive

Identifiers

  • HAL Id : hal-01849084, version 1

Collections

Citation

Oguz Kaya, Ramakrishnan Kannan, Grey Ballard. Partitioning and Communication Strategies for Sparse Non-negative Matrix Factorization. [Research Report] RR-9198, Inria Bordeaux Sud-Ouest. 2018. ⟨hal-01849084⟩

Share

Metrics

Record views

472

Files downloads

204