Analysis of the Jobs Resource Utilization on a Production System

Joseph Emeras; Cristian Ruiz; Jean-Marc Vincent; Olivier Richard

Communication Dans Un Congrès Année : 2013

Analysis of the Jobs Resource Utilization on a Production System

(1) , (1) , (1) , (1)

Joseph Emeras

Fonction : Auteur
PersonId : 911806

Middleware efficiently scalable

Cristian Ruiz

Fonction : Auteur

Middleware efficiently scalable

Jean-Marc Vincent

Fonction : Auteur
PersonId : 750922
IdHAL : jean-marc-vincent
ORCID : 0000-0003-3576-2024

Middleware efficiently scalable

Olivier Richard

Fonction : Auteur
PersonId : 4299
IdHAL : olivier-richard
ORCID : 0009-0005-8679-2874
IdRef : 118127438

Middleware efficiently scalable

Résumé

In HPC community the System Utilization metric enables to determine if the resources of the cluster are efficiently used by the batch scheduler. This metric considers that all the allocated resources (memory, disk, processors, etc) are full-time utilized. To optimize the system performance, we have to consider the effective physical consumption by jobs regarding the resource allocations. This information gives an insight into whether the cluster resources are efficiently used by the jobs. In this work we propose an analysis of production clusters based on the jobs resource utilization. The principle is to collect simultaneously traces from the job scheduler (provided by logs) and jobs resource consumption. The latter has been realized by developing a job monitoring tool, whose impact on the system has been measured as lightweight (0.35% speed-down). The key point is to statistically analyze both traces to detect and explain underutilization of the resources. This could enable to detect abnormal behavior, bottlenecks in the cluster leading to a poor scalability, and justifying optimizations such as gang scheduling or best-effort scheduling. This method has been applied to two medium sized production clusters on a period of eight months.

Domaines

Calcul parallèle, distribué et partagé [cs.DC]

Fichier principal

emeras.pdf (1.09 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Arnaud Legrand : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00918372

Soumis le : vendredi 13 décembre 2013-14:11:53

Dernière modification le : jeudi 4 avril 2024-21:15:30

Archivage à long terme le : mardi 18 mars 2014-12:35:41

Dates et versions

hal-00918372 , version 1 (13-12-2013)

Identifiants

HAL Id : hal-00918372 , version 1

Citer

Joseph Emeras, Cristian Ruiz, Jean-Marc Vincent, Olivier Richard. Analysis of the Jobs Resource Utilization on a Production System. Job Scheduling Strategies for Parallel Processing, 2013, Boston, United States. ⟨hal-00918372⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS INRIA LIG INRIA2 POLYTECH-GRENOBLE LIG_SIDCH

258 Consultations

474 Téléchargements

Analysis of the Jobs Resource Utilization on a Production System

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager