Adapting Batch Scheduling to Workload Characteristics: What can we expect From Online Learning?

Arnaud Legrand 1 Denis Trystram 2 Salah Zrigui 2
1 POLARIS - Performance analysis and optimization of LARge Infrastructures and Systems
Inria Grenoble - Rhône-Alpes, LIG - Laboratoire d'Informatique de Grenoble
2 DATAMOVE - Data Aware Large Scale Computing
Inria Grenoble - Rhône-Alpes, LIG - Laboratoire d'Informatique de Grenoble
Abstract : Despite the impressive growth and size of super-computers, the computational power they provide still cannot match the demand. Efficient and fair resource allocation is a critical task. Super-computers use Resource and Job Management Systems to schedule applications, which is generally done by relying on generic index policies such as First Come First Served and Shortest Processing time First in combination with Backfilling strategies. Unfortunately, such generic policies often fail to exploit specific characteristics of real workloads. In this work, we focus on improving the performance of online schedulers. We study mixed policies, which are created by combining multiple job characteristics in a weighted linear expression, as opposed to classical pure policies which use only a single characteristic. This larger class of scheduling policies aims at providing more flexibility and adaptability. We use space coverage and black-box optimization techniques to explore this new space of mixed policies and we study how can they adapt to the changes in the workload. We perform an extensive experimental campaign through which we show that (1) even the best pure policy is far from optimal and that (2) using a carefully tuned mixed policy would allow to significantly improve the performance of the system. (3) We also provide empirical evidence that there is no one size fits all policy, by showing that the rapid workload evolution seems to prevent classical online learning algorithms from being effective.
Document type :
Reports
Complete list of metadatas

Cited literature [18 references]  Display  Hide  Download

https://hal.inria.fr/hal-01896121
Contributor : Salah Zrigui <>
Submitted on : Monday, October 15, 2018 - 6:52:46 PM
Last modification on : Friday, June 28, 2019 - 4:01:54 PM
Long-term archiving on : Wednesday, January 16, 2019 - 4:20:35 PM

File

RR-9212.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01896121, version 1

Citation

Arnaud Legrand, Denis Trystram, Salah Zrigui. Adapting Batch Scheduling to Workload Characteristics: What can we expect From Online Learning?. [Research Report] Grenoble 1 UGA - Université Grenoble Alpe. 2018, pp.1-23. ⟨hal-01896121⟩

Share

Metrics

Record views

175

Files downloads

171