Analysis of the Jobs Resource Utilization on a Production System

Joseph Emeras 1 Cristian Ruiz 1 Jean-Marc Vincent 1 Olivier Richard 1
1 MESCAL - Middleware efficiently scalable
Inria Grenoble - Rhône-Alpes, LIG - Laboratoire d'Informatique de Grenoble
Abstract : In HPC community the System Utilization metric enables to determine if the resources of the cluster are efficiently used by the batch scheduler. This metric considers that all the allocated resources (memory, disk, processors, etc) are full-time utilized. To optimize the system performance, we have to consider the effective physical consumption by jobs regarding the resource allocations. This information gives an insight into whether the cluster resources are efficiently used by the jobs. In this work we propose an analysis of production clusters based on the jobs resource utilization. The principle is to collect simultaneously traces from the job scheduler (provided by logs) and jobs resource consumption. The latter has been realized by developing a job monitoring tool, whose impact on the system has been measured as lightweight (0.35% speed-down). The key point is to statistically analyze both traces to detect and explain underutilization of the resources. This could enable to detect abnormal behavior, bottlenecks in the cluster leading to a poor scalability, and justifying optimizations such as gang scheduling or best-effort scheduling. This method has been applied to two medium sized production clusters on a period of eight months.
Type de document :
Communication dans un congrès
Walfredo Cirne and Narayan Desai and Eitan Frachtenberg and Uwe Schwiegelshohn. Job Scheduling Strategies for Parallel Processing, 2013, Boston, United States. Springer, 2013, Lecture Notes in Computer Science
Liste complète des métadonnées

Littérature citée [19 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-00918372
Contributeur : Arnaud Legrand <>
Soumis le : vendredi 13 décembre 2013 - 14:11:53
Dernière modification le : mercredi 29 novembre 2017 - 15:20:00
Document(s) archivé(s) le : mardi 18 mars 2014 - 12:35:41

Fichier

emeras.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00918372, version 1

Collections

Citation

Joseph Emeras, Cristian Ruiz, Jean-Marc Vincent, Olivier Richard. Analysis of the Jobs Resource Utilization on a Production System. Walfredo Cirne and Narayan Desai and Eitan Frachtenberg and Uwe Schwiegelshohn. Job Scheduling Strategies for Parallel Processing, 2013, Boston, United States. Springer, 2013, Lecture Notes in Computer Science. 〈hal-00918372〉

Partager

Métriques

Consultations de la notice

241

Téléchargements de fichiers

110