A Methodology to Scale Containerized HPC Infrastructures in the Cloud - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2022

A Methodology to Scale Containerized HPC Infrastructures in the Cloud

Nicolas Grenèche
  • Fonction : Auteur
  • PersonId : 933243
Tarek Menouer
  • Fonction : Auteur
  • PersonId : 1217394
Christophe Cérin
  • Fonction : Auteur
  • PersonId : 1117366
Olivier Richard

Résumé

This paper introduces a generic method to scale HPC clusters on top of the Kubernetes cloud orchestrator. Users define their targeted infrastructure with the usual Kubernetes syntax for recipes, and our approach automatically translates the description to a full-fledged containerized HPC cluster. Moreover, resource extensions or shrinks are handled, allowing a dynamic resize of the containerized HPC cluster without disturbing its running. The Kubernetes orchestrator acts as a provisioner. We applied the generic method to three orthogonal architectural designs Open Source HPC schedulers: SLURM, OAR, and OpenPBS. Through a series of experiments, the paper demonstrates the potential of our approach regarding the scalability issues of HPC clusters and the simultaneous deployment of several job schedulers in the same physical infrastructure. It should be noticed that our plan does not require any modification either in the containers orchestrator or in the HPC schedulers. Our proposal is a step forward to reconciling the two ecosystems of HPC and cloud. It also calls for new research directions and concrete implementations for the dynamic consolidation of servers or sober placement policies at the orchestrator level. The works contribute a new approach to running HPC clusters in a cloud environment and test the technique on robustness by adding and removing nodes on the fly.
Fichier principal
Vignette du fichier
Europar22 (1).pdf (303.94 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03946821 , version 1 (19-01-2023)

Licence

Paternité - Pas d'utilisation commerciale

Identifiants

Citer

Nicolas Grenèche, Tarek Menouer, Christophe Cérin, Olivier Richard. A Methodology to Scale Containerized HPC Infrastructures in the Cloud. Europar 2022, Aug 2022, Glasgow, United Kingdom. pp.203-217, ⟨10.1007/978-3-031-12597-3_13⟩. ⟨hal-03946821⟩
56 Consultations
111 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More