A Fully Empirical Autotuned Dense QR Factorization for Multicore Architectures

Abstract : Tuning numerical libraries has become more difficult over time, as systems get more sophisticated. In particular, modern multicore machines make the behaviour of algorithms hard to forecast and model. In this paper, we tackle the issue of tuning a dense QR factorization on multicore architectures using a fully empirical approach.We exhibit a few strong empirical properties that enable us to efficiently prune the search space. Our method is automatic, fast and reliable. The tuning process is indeed fully performed at install time in less than one hour and ten minutes on five out of seven platforms. We achieve an average performance varying from 97% to 100% of the optimum performance depending on the platform. This work is a basis for autotuning the PLASMA library and enabling easy performance portability across hardware systems.
Type de document :
Communication dans un congrès
Jeannot, Emmanuel and Namyst, Raymond and Roman, Jean. Euro-Par 2011 Parallel Processing, Aug 2011, Bordeaux, France. Springer Berlin / Heidelberg, 6853, pp.194-205, 2011, Lecture Notes in Computer Science. <http://dx.doi.org/10.1007/978-3-642-23397-5_19>
Liste complète des métadonnées

https://hal.inria.fr/hal-00726654
Contributeur : Emmanuel Agullo <>
Soumis le : vendredi 31 août 2012 - 00:57:01
Dernière modification le : jeudi 10 septembre 2015 - 01:08:34

Identifiants

  • HAL Id : hal-00726654, version 1

Collections

Citation

Emmanuel Agullo, Jack Dongarra, Rajib Nath, Stanimire Tomov. A Fully Empirical Autotuned Dense QR Factorization for Multicore Architectures. Jeannot, Emmanuel and Namyst, Raymond and Roman, Jean. Euro-Par 2011 Parallel Processing, Aug 2011, Bordeaux, France. Springer Berlin / Heidelberg, 6853, pp.194-205, 2011, Lecture Notes in Computer Science. <http://dx.doi.org/10.1007/978-3-642-23397-5_19>. <hal-00726654>

Partager

Métriques

Consultations de la notice

181