HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation

Lessons from FTM: an Experiment in the Design and Implementation of a Low Cost Fault Tolerant System

Gilles Muller 1 Michel Banâtre 1 Mireille Hue 1 Nadine Peyrouze 1 Bruno Rochat 1
1 SOLIDOR - Design of Distributed Operating Systems
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, INRIA Rennes
Abstract : This report describes an experiment in the design of a general purpose fault tolerant system, FTM. The main objective of the FTM design was to implement a "low-cost" fault tolerant system that could be used on standard workstations. At the operating system level, our goal was to provide a methodology for the design of modular reliable operating systems, while offering fault tolerance transparency to user applications. In other words, porting an application to FTM had only to require compiling the source code without having to modify it. These objectives were achieved using the Mach micro-kernel and a modular set of reliable servers which implement application checkpoints and provide continuous system functions despite machine crashes. At the architectural level, our approach relies on a high performance stable storage implementation, called Stable Transactional Memory (STM), which can be implemented either by hardware or software. We first motivate our design choices, then we detail the FTM implementation at both architectural and operating system level. We comment on the reasons for the evolution of our stable memory technology from hardware to software. Finally, we present a performance evaluation of the FTM prototype. We conclude with lessons learned and give some assessments.
Document type :
Complete list of metadata

Cited literature [43 references]  Display  Hide  Download

Contributor : Rapport de Recherche Inria Connect in order to contact the contributor
Submitted on : Wednesday, May 24, 2006 - 2:38:54 PM
Last modification on : Friday, February 4, 2022 - 3:25:33 AM
Long-term archiving on: : Sunday, April 4, 2010 - 9:54:38 PM


  • HAL Id : inria-00074161, version 1


Gilles Muller, Michel Banâtre, Mireille Hue, Nadine Peyrouze, Bruno Rochat. Lessons from FTM: an Experiment in the Design and Implementation of a Low Cost Fault Tolerant System. [Research Report] RR-2517, INRIA. 1995. ⟨inria-00074161⟩



Record views


Files downloads