Skip to Main content Skip to Navigation

Architecture for the Next Generation System Management Tools for Distributed Computing Platforms

Jérôme Gallard 1 Geoffroy Vallée 2 Thomas Naughton 2 Adrien Lebre 3, 4 Stephen Scott 2 Christine Morin 1
1 MYRIADS - Design and Implementation of Autonomous Distributed Systems
IRISA-D1 - SYSTÈMES LARGE ÉCHELLE, Inria Rennes – Bretagne Atlantique
3 ASCOLA - Aspect and composition languages
LINA - Laboratoire d'Informatique de Nantes Atlantique, Département informatique - EMN, Inria Rennes – Bretagne Atlantique
Abstract : In order to get more results or greater accuracy, computational scientists execute mainly parallel or distributed applications, and try to scale these applications up. Accordingly, they use more and more distributed resources, using local large-scale HPC systems, grids or even clouds. However, in most of cases, the use and management of such platforms is static. Indeed generally, the application has to be adapted to the environment rather than adapting the environment to the applications' needs. In addition, platforms are managed through the concept of time and space partitioning mainly via the use of batch schedulers: time partitioning enables the execution of several applications on a same resources, and space partitioning enables the execution of applications across several distributed resources. This leads to some usage limitations, where applications can only be executed on a subset of the available resources. Therefore, scientists have to manage technical details related to the execution of their applications on each target HPC platforms, which could result in application modifications, rather than focusing on the science. In this article, we advocate for a system management tool enabling the transparent configuration of the HPC platform and the customization of the execution environment for large-scale HPC systems (such as clusters or MPPs), grids, and clouds. We propose a new approach to manage these systems in a more dynamic way, where the resources can be configured and reconfigured automatically and transparently. The proposed solution is not removing the benefit of resource management systems such as batch system (they still provide a well-known interface for job submission), but rather redefine the underlying system capabilities. Our approach is based on a refinement of the concept of emulation and virtualization introduced by Goldberg. Furthermore, the proposed approach leads to the definition of a method that provides a unique interface to scientists for the deployment and management of their applications on HPC platforms. This method is based on two concepts: (i) the Virtual System Environment (VSE), and (ii) the Virtual Platforms (VPs).
Document type :
Complete list of metadata

Cited literature [2 references]  Display  Hide  Download
Contributor : Jérôme Gallard Connect in order to contact the contributor
Submitted on : Tuesday, June 22, 2010 - 6:00:57 PM
Last modification on : Thursday, January 20, 2022 - 5:28:05 PM
Long-term archiving on: : Monday, October 22, 2012 - 2:41:30 PM


Files produced by the author(s)


  • HAL Id : inria-00494328, version 1


Jérôme Gallard, Geoffroy Vallée, Thomas Naughton, Adrien Lebre, Stephen Scott, et al.. Architecture for the Next Generation System Management Tools for Distributed Computing Platforms. [Research Report] RR-7325, INRIA. 2010. ⟨inria-00494328⟩



Les métriques sont temporairement indisponibles