Architecture for the Next Generation System Management Tools for Distributed Computing Platforms - Archive ouverte HAL Access content directly
Reports (Research Report) Year : 2010

Architecture for the Next Generation System Management Tools for Distributed Computing Platforms

(1) , (2) , (2) , (3, 4) , (2) , (1)
1
2
3
4

Abstract

In order to get more results or greater accuracy, computational scientists execute mainly parallel or distributed applications, and try to scale these applications up. Accordingly, they use more and more distributed resources, using local large-scale HPC systems, grids or even clouds. However, in most of cases, the use and management of such platforms is static. Indeed generally, the application has to be adapted to the environment rather than adapting the environment to the applications' needs. In addition, platforms are managed through the concept of time and space partitioning mainly via the use of batch schedulers: time partitioning enables the execution of several applications on a same resources, and space partitioning enables the execution of applications across several distributed resources. This leads to some usage limitations, where applications can only be executed on a subset of the available resources. Therefore, scientists have to manage technical details related to the execution of their applications on each target HPC platforms, which could result in application modifications, rather than focusing on the science. In this article, we advocate for a system management tool enabling the transparent configuration of the HPC platform and the customization of the execution environment for large-scale HPC systems (such as clusters or MPPs), grids, and clouds. We propose a new approach to manage these systems in a more dynamic way, where the resources can be configured and reconfigured automatically and transparently. The proposed solution is not removing the benefit of resource management systems such as batch system (they still provide a well-known interface for job submission), but rather redefine the underlying system capabilities. Our approach is based on a refinement of the concept of emulation and virtualization introduced by Goldberg. Furthermore, the proposed approach leads to the definition of a method that provides a unique interface to scientists for the deployment and management of their applications on HPC platforms. This method is based on two concepts: (i) the Virtual System Environment (VSE), and (ii) the Virtual Platforms (VPs).
Fichier principal
Vignette du fichier
RR-7325.pdf (578.98 Ko) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

inria-00494328 , version 1 (22-06-2010)

Identifiers

  • HAL Id : inria-00494328 , version 1

Cite

Jérôme Gallard, Geoffroy Vallée, Thomas J. Naughton, Adrien Lebre, Stephen L. Scott, et al.. Architecture for the Next Generation System Management Tools for Distributed Computing Platforms. [Research Report] RR-7325, INRIA. 2010. ⟨inria-00494328⟩
517 View
155 Download

Share

Gmail Facebook Twitter LinkedIn More