Skip to Main content Skip to Navigation
Theses

Apprehending heterogeneity at (very) large scale

Abstract : The demand for computation power is steadily increasing, driven by the need to simulate more and more complex phenomena with an increasing amount of consumed/produced data. To meet this demand, the High Performance Computing platforms grow in both size and heterogeneity. Indeed, heterogeneity allows splitting problems for a more efficient resolution of sub-problems with ad hoc hardware or algorithms. This heterogeneity arises in the platforms’ architecture and in the variety of processed applications. Consequently, the performances become more sensitive to the execution context. We study in this dissertation how to qualitatively bring—at a reasonable cost—context-awareness/obliviousness into allocation and scheduling policies. This study is conducted from two standpoints: within single applications, and at the whole platform scale from an inter-applications perspective. We first study the minimization of the makespan of sequential tasks on platforms with a mixed architecture composed of multiple CPUs and GPUs. We integrate context-awareness into schedulers with an affinity mechanism that improves local behavior. This mechanism has been implemented in a parallel run-time, and experiments show that it is able to reduce the memory transfers while maintaining a low makespan. We then extend the model to implicitly consider parallelism on the CPUs with the moldable-task model. We propose an efficient algorithm formulated as an integer linear program with a constant performance guarantee of 3/2+ε. Second, we devise a new modeling framework where constraints are a first-class tool. Rather than extending existing models to consider all possible interactions, we reduce the set of feasible schedules by further constraining existing models. We propose a set of reasonable constraints to model application spreading and I/O traffic. We then instantiate this framework for unidimensional topologies, and propose a comprehensive case study of the makespan minimization under convex and local constraints.
Complete list of metadatas

Cited literature [84 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/tel-01722991
Contributor : Raphaël Bleuse <>
Submitted on : Monday, March 5, 2018 - 11:42:31 AM
Last modification on : Friday, May 25, 2018 - 10:31:49 AM
Document(s) archivé(s) le : Wednesday, June 6, 2018 - 2:48:14 PM

Files

manuscript.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : tel-01722991, version 1

Citation

Raphaël Bleuse. Apprehending heterogeneity at (very) large scale. Distributed, Parallel, and Cluster Computing [cs.DC]. Université Grenoble Alpes, 2017. English. ⟨tel-01722991v1⟩

Share

Metrics

Record views

70

Files downloads

34