HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation

Dynamic Resource Management in a Cluster for Scalability and High-Availability

Pascal Gallard 1 Christine Morin 1 Renaud Lottiaux 1
1 PARIS - Programming distributed parallel systems for large scale numerical simulation
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, ENS Cachan - École normale supérieure - Cachan, Inria Rennes – Bretagne Atlantique
Abstract : In order to execute high performance applications on a cluster, it is highly desirable to provide distributed services that globally manage physical resources distributed over the cluster nodes. However, as a distributed service may use resources located on different nodes, it becomes sensitive to changes in the cluster configuration due to node addition, reboot or failure. In this paper, we propose a generic service performing dynamic resource management in a cluster in order to provide distributed services with high availability and scalability. This service has been implemented in Gobelins cluster operating system. The dynamic resource management service we propose makes node addition and reboot nearly transparent to all distributed services of Gobelins and, as a consequence, fully transparent to application- s. In the event of a node failure, applications using resources located on the failed node need to be restarted from a previously saved checkpoint but the availability of the cluster operating system is guaranteed, provided that its distributed services implement reconfiguration features.
Document type :
Complete list of metadata

Contributor : Rapport de Recherche Inria Connect in order to contact the contributor
Submitted on : Tuesday, May 23, 2006 - 8:12:12 PM
Last modification on : Friday, February 4, 2022 - 3:25:08 AM
Long-term archiving on: : Sunday, April 4, 2010 - 11:00:07 PM


  • HAL Id : inria-00072241, version 1


Pascal Gallard, Christine Morin, Renaud Lottiaux. Dynamic Resource Management in a Cluster for Scalability and High-Availability. [Research Report] RR-4347, INRIA. 2002. ⟨inria-00072241⟩



Record views


Files downloads