Exploiting Two-Level Parallelism by Aggregating Computing Resources in Task-Based Applications Over Accelerator-Based Machines

Terry Cojean 1, 2
1 STORM - STatic Optimizations, Runtime Methods
LaBRI - Laboratoire Bordelais de Recherche en Informatique, Inria Bordeaux - Sud-Ouest
Abstract : Computing platforms are now extremely complex providing an increasing number of CPUs and accelerators. This trend makes balancing computations between these heterogeneous resources performance critical. In this paper we tackle the task granularity problem and we propose aggregating several CPUs in order to execute larger parallel tasks and thus find a better equilibrium between the workload assigned to the CPUs and the one assigned to the GPUs. To this end, we rely on the notion of scheduling contexts in order to isolate the parallel tasks and thus delegate the management of the task parallelism to the inner scheduling strategy. We demonstrate the relevance of our approach through the dense Cholesky factorization kernel implemented on top of the StarPU task-based runtime system. We allow having parallel elementary tasks and using Intel MKL parallel implementation optimized through the use of the OpenMP runtime system. We show how our approach handles the interaction between the StarPU and the OpenMP runtime systems and how it exploits the parallelism of modern accelerator-based machines. We present experimental results showing that our solution outperforms state of the art implementations to reach a peak performance of 4.5 TFlop/s on a platform equipped with 20 CPU cores and 4 GPU devices.
Type de document :
Communication dans un congrès
SIAM Conference on Parallel Processing for Scientific Computing (SIAM PP 2016), Apr 2016, Paris, France. 2016, 〈http://www.siam.org/meetings/pp16/〉
Liste complète des métadonnées

Littérature citée [4 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01502749
Contributeur : Terry Cojean <>
Soumis le : lundi 24 avril 2017 - 17:13:12
Dernière modification le : jeudi 11 janvier 2018 - 06:27:21

Licence


Copyright (Tous droits réservés)

Identifiants

  • HAL Id : hal-01502749, version 1

Collections

Citation

Terry Cojean. Exploiting Two-Level Parallelism by Aggregating Computing Resources in Task-Based Applications Over Accelerator-Based Machines. SIAM Conference on Parallel Processing for Scientific Computing (SIAM PP 2016), Apr 2016, Paris, France. 2016, 〈http://www.siam.org/meetings/pp16/〉. 〈hal-01502749〉

Partager

Métriques

Consultations de la notice

123

Téléchargements de fichiers

9