Architecture for the Next Generation System Management Tools for Distributed Computing Platforms

Jérôme Gallard 1 Geoffroy Vallée 2 Thomas Naughton 2 Adrien Lèbre 3, 4 Stephen Scott 2 Christine Morin 1
1 MYRIADS - Design and Implementation of Autonomous Distributed Systems
IRISA-D1 - SYSTÈMES LARGE ÉCHELLE, Inria Rennes – Bretagne Atlantique
3 ASCOLA - Aspect and composition languages
LINA - Laboratoire d'Informatique de Nantes Atlantique, Département informatique - EMN, Inria Rennes – Bretagne Atlantique
Abstract : In order to get more results or greater accuracy, computational scientists execute mainly parallel or distributed applications, and try to scale these applications up. Accordingly, they use more and more distributed resources, using local large-scale HPC systems, grids or even clouds. However, in most of cases, the use and management of such platforms is static. Indeed generally, the application has to be adapted to the environment rather than adapting the environment to the applications' needs. In addition, platforms are managed through the concept of time and space partitioning mainly via the use of batch schedulers: time partitioning enables the execution of several applications on a same resources, and space partitioning enables the execution of applications across several distributed resources. This leads to some usage limitations, where applications can only be executed on a subset of the available resources. Therefore, scientists have to manage technical details related to the execution of their applications on each target HPC platforms, which could result in application modifications, rather than focusing on the science. In this article, we advocate for a system management tool enabling the transparent configuration of the HPC platform and the customization of the execution environment for large-scale HPC systems (such as clusters or MPPs), grids, and clouds. We propose a new approach to manage these systems in a more dynamic way, where the resources can be configured and reconfigured automatically and transparently. The proposed solution is not removing the benefit of resource management systems such as batch system (they still provide a well-known interface for job submission), but rather redefine the underlying system capabilities. Our approach is based on a refinement of the concept of emulation and virtualization introduced by Goldberg. Furthermore, the proposed approach leads to the definition of a method that provides a unique interface to scientists for the deployment and management of their applications on HPC platforms. This method is based on two concepts: (i) the Virtual System Environment (VSE), and (ii) the Virtual Platforms (VPs).
Type de document :
[Research Report] RR-7325, INRIA. 2010
Liste complète des métadonnées

Littérature citée [2 références]  Voir  Masquer  Télécharger
Contributeur : Jérôme Gallard <>
Soumis le : mardi 22 juin 2010 - 18:00:57
Dernière modification le : samedi 24 mars 2018 - 01:44:21
Document(s) archivé(s) le : lundi 22 octobre 2012 - 14:41:30


Fichiers produits par l'(les) auteur(s)


  • HAL Id : inria-00494328, version 1


Jérôme Gallard, Geoffroy Vallée, Thomas Naughton, Adrien Lèbre, Stephen Scott, et al.. Architecture for the Next Generation System Management Tools for Distributed Computing Platforms. [Research Report] RR-7325, INRIA. 2010. 〈inria-00494328〉



Consultations de la notice


Téléchargements de fichiers