Skip to Main content Skip to Navigation
Conference papers

Communication-Aware Load Balancing of the LU Factorization over Heterogeneous Clusters

Abstract : Large clusters and supercomputers are rapidly evolving and may be subject to regular hardware updates that increase the chances of becoming heterogeneous. Homogeneous clusters may also have variable performance capabilities due to processor manufacturing, or even partitions equipped with different types of accelerators. Data distribution over heterogeneous nodes is very challenging but essential to exploit all resources efficiently. In this article, we build upon task-based runtimes' flexibility to study the interplay between static communication-aware data distribution strategies and dynamic scheduling of the linear algebra LU factorization over heterogeneous sets of hybrid nodes. We propose two techniques derived from the state-of-the-art 1D×1D data distributions. First, to use fewer computing nodes towards the end to better match performance bounds and save computing power. Second, to carefully move a few blocks between nodes to optimize even further the load balancing among nodes. We also demonstrate how 1D×1D data distributions, tailored for heterogeneous nodes, can scale better with homogeneous clusters than classical block-cyclic distributions. Validation is carried out both in real and in simulated environments under homogeneous and heterogeneous platforms, demonstrating compelling performance improvements.
Complete list of metadata

Cited literature [27 references]  Display  Hide  Download
Contributor : Arnaud Legrand Connect in order to contact the contributor
Submitted on : Wednesday, May 27, 2020 - 2:04:39 PM
Last modification on : Tuesday, November 24, 2020 - 4:00:18 PM


Files produced by the author(s)


  • HAL Id : hal-02633985, version 1


Lucas Nesi, Lucas Mello Schnorr, Arnaud Legrand. Communication-Aware Load Balancing of the LU Factorization over Heterogeneous Clusters. IEEE International Conference on Parallel and Distributed Systems (ICPADS), Dec 2020, Hong Kong, France. ⟨hal-02633985⟩



Record views


Files downloads