Hierarchical DAG Scheduling for Hybrid Distributed Systems

Abstract : Accelerator-enhanced computing platforms have drawn a lot of attention due to their massive peak com-putational capacity. Despite significant advances in the pro-gramming interfaces to such hybrid architectures, traditional programming paradigms struggle mapping the resulting multi-dimensional heterogeneity and the expression of algorithm parallelism, resulting in sub-optimal effective performance. Task-based programming paradigms have the capability to alleviate some of the programming challenges on distributed hybrid many-core architectures. In this paper we take this concept a step further by showing that the potential of task-based programming paradigms can be greatly increased with minimal modification of the underlying runtime combined with the right algorithmic changes. We propose two novel recursive algorithmic variants for one-sided factorizations and describe the changes to the PaRSEC task-scheduling runtime to build a framework where the task granularity is dynamically adjusted to adapt the degree of available parallelism and kernel effi-ciency according to runtime conditions. Based on an extensive set of results we show that, with one-sided factorizations, i.e. Cholesky and QR, a carefully written algorithm, supported by an adaptive tasks-based runtime, is capable of reaching a degree of performance and scalability never achieved before in distributed hybrid environments.
Complete list of metadatas

Cited literature [14 references]  Display  Hide  Download

https://hal.inria.fr/hal-01078359
Contributor : Mathieu Faverge <>
Submitted on : Tuesday, December 16, 2014 - 11:07:09 AM
Last modification on : Thursday, December 13, 2018 - 6:48:02 PM
Long-term archiving on : Monday, March 23, 2015 - 1:44:53 PM

File

recursive.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01078359, version 1

Citation

Wei Wu, Aurelien Bouteiller, George Bosilca, Mathieu Faverge, Jack Dongarra. Hierarchical DAG Scheduling for Hybrid Distributed Systems. IEEE International Parallel & Distributed Processing Symposium (IPDPS 2015), May 2015, Hyderabad, India. ⟨hal-01078359⟩

Share

Metrics

Record views

517

Files downloads

859