Achieving Numerical Accuracy and High Performance using Recursive Tile LU Factorization

Jack J. Dongarra; Mathieu Faverge; Hatem Ltaief; Piotr Luszczek

Rapport (Rapport De Recherche) Année : 2011

Achieving Numerical Accuracy and High Performance using Recursive Tile LU Factorization

(1) , (1) , (2) , (1)

1
2

Jack J. Dongarra

Fonction : Auteur

Innovative Computing Laboratory [Knoxville]

Mathieu Faverge

Fonction : Auteur
PersonId : 261
IdHAL : mfaverge
ORCID : 0000-0002-2128-1230
IdRef : 140338217

Innovative Computing Laboratory [Knoxville]

Hatem Ltaief

Fonction : Auteur
PersonId : 938993

King Abdullah University of Science and Technology [Saudi Arabia]

Piotr Luszczek

Fonction : Auteur

Innovative Computing Laboratory [Knoxville]

Résumé

The LU factorization is an important numerical algorithm for solving systems of linear equations in science and engineering, and is characteristic of many dense linear algebra computations. It has even become the de facto numerical algorithm implemented within the LINPACK benchmark to rank the most powerful supercomputers in the world, collected bt the TOP500 website. In this context, the challenge in developing new algorithms for the scientific community resides in the combination of two goals: achieving high performance and maintaining the accuracy of the numerical algorithm. This paper proposes a novel approach for computing the LU factorization in parallel on multicore architectures, which not only improves the overall performance, but also sustains the numerical quality of the standard LU factorization algorithm with partial pivoting. While the update of the trailing submatrix is computationally intensive and highly parallel, the inherently problematic portion of the LU factorization is the panel factorization due to its memory-bound characteristic as well as the atomicity of selecting the appropriate pivots. Our approach uses a parallel fine-grained recursive formulation of the panel factorization step and implements the update of the trailing submatrix with the tile algorithm. Based on conflict-free partitioning of the data and lockless synchronization mechanisms, our implementation lets the overall computation flow naturally without contention. The dynamic runtime system called QUARK is then able to schedule tasks with heterogeneous granularities and to transparently introduce algorithmic lookahead. The performance results of our implementation are competitive compared to the currently available software packages and libraries. In particular, it is up to 40% faster when compared to the equivalent Intel MKL routine and up to 3-fold faster than LAPACK with multithreaded Intel MKL BLAS.

Domaines

Calcul parallèle, distribué et partagé [cs.DC]

Fichier principal

lawn259.pdf (618.53 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Mathieu Faverge : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00809765

Soumis le : mardi 9 avril 2013-17:32:48

Dernière modification le : vendredi 9 février 2024-12:56:03

Archivage à long terme le : lundi 3 avril 2017-02:46:55

Dates et versions

hal-00809765 , version 1 (09-04-2013)

Identifiants

HAL Id : hal-00809765 , version 1

Citer

Jack J. Dongarra, Mathieu Faverge, Hatem Ltaief, Piotr Luszczek. Achieving Numerical Accuracy and High Performance using Recursive Tile LU Factorization. [Research Report] 2011. ⟨hal-00809765⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

LARA

218 Consultations

380 Téléchargements

Achieving Numerical Accuracy and High Performance using Recursive Tile LU Factorization

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager