Global Resource Management for High Availability and Performance in a DSM-based Cluster

Christine Morin 1 Renaud Lottiaux 1
1 CAPS - Compilation, parallel architectures and system
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : High availability and performance are two desirable properties for the execution of long-running parallel scientific applications on software DSM based clusters. Global resource management in the operating system is a way to achieve these properties. To illustrate this approach, a system integrating a paged-based shared virtual memory and a parallel file system for global management of memory and disk resources is presented. Main design issues include the optimization of disk accesses in the context of a single level storage system and fault tolerance.
Document type :
Reports
Complete list of metadatas

https://hal.inria.fr/inria-00072975
Contributor : Rapport de Recherche Inria <>
Submitted on : Wednesday, May 24, 2006 - 11:29:53 AM
Last modification on : Friday, November 16, 2018 - 1:30:27 AM
Long-term archiving on : Sunday, April 4, 2010 - 9:11:50 PM

Identifiers

  • HAL Id : inria-00072975, version 1

Citation

Christine Morin, Renaud Lottiaux. Global Resource Management for High Availability and Performance in a DSM-based Cluster. [Research Report] RR-3694, INRIA. 1999. ⟨inria-00072975⟩

Share

Metrics

Record views

255

Files downloads

111