Architecture for the Next Generation System Management Tools for High Performance Computing Platforms

Geoffroy Vallée 1 Thomas Naughton 1 Anand Tikotekar 1 Jérôme Gallard 2, * Stephen Scott 1 Christine Morin 2
* Corresponding author
2 PARIS - Programming distributed parallel systems for large scale numerical simulation
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, ENS Cachan - École normale supérieure - Cachan, Inria Rennes – Bretagne Atlantique
Abstract : Today, computational scientists mainly execute parallel or distributed applications, and try to scale up to get more results or greater data precision. As a result, they use more and more distributed resources, using local large-scale HPC systems (such as clusters or MPP), grids or even clouds. The difficulty of managing those platforms is their differences in nature, each degree abstracting some of the complexity created by resource distribution. For instance, clusters and MPP systems are located on a single site, composed of different ``partitions'' (e.g., I/O nodes, compute nodes). In grids, ``virtual organizations (VOs)'' are one of the main concepts; since VOs are global and multi-users, they abstract both the complexity of the local resource management and account management away from the users. Finally, clouds provide an high degree of abstraction via the concept of ``services'', which can be implemented via a direct privileged access to the hardware or the usage of Internet based services. But all those cases require local management of resources and some kind of coordination (e.g., coordination between partitions, remote sites, different administration domains). This document presents a detailed description of the architecture of our novel system-management tool that can be used for the management of clusters/MPP systems, grids, and clouds. The architecture is based on three different concepts: (i) Virtual System Environment (VSE), (ii) Virtual Organizations (VOs), and (iii) Virtual Platforms (VPs).
Document type :
Reports
Complete list of metadatas

Cited literature [1 references]  Display  Hide  Download

https://hal.inria.fr/inria-00424107
Contributor : Jérôme Gallard <>
Submitted on : Wednesday, October 14, 2009 - 9:59:27 AM
Last modification on : Monday, December 10, 2018 - 11:34:08 AM
Long-term archiving on : Tuesday, October 16, 2012 - 12:12:18 PM

File

RR-7062.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : inria-00424107, version 1

Citation

Geoffroy Vallée, Thomas Naughton, Anand Tikotekar, Jérôme Gallard, Stephen Scott, et al.. Architecture for the Next Generation System Management Tools for High Performance Computing Platforms. [Research Report] RR-7062, INRIA. 2009. ⟨inria-00424107⟩

Share

Metrics

Record views

667

Files downloads

170