Skip to Main content Skip to Navigation
Journal articles

Active optimistic and distributed message logging for message-passing applications

Thomas Ropars 1 Christine Morin 1
1 MYRIADS - Design and Implementation of Autonomous Distributed Systems
Inria Rennes – Bretagne Atlantique , IRISA-D1 - SYSTÈMES LARGE ÉCHELLE
Abstract : Message logging is an attractive solution to provide fault tolerance for message-passing applications because it is more scalable than coordinated checkpointing. Sender-based message logging is a well-known optimization that allows the saving of message payload in the sender memory. Thus, only message reception events have to be logged reliably by using an event logger. This paper proposes solutions to further improve message logging protocol scalability. In existing works on message logging, the event logger has always been considered as a centralized process. We propose a distributed event logger that takes advantage of multi-core processors that are to be executed in parallel with application processes, leveraging the volatile memory of the nodes to save events reliably. We also propose the combination of our distributed event logger and O2P, an active optimistic message logging protocol using a gossip-based protocol to disseminate information on new stable events. Our distributed event logger and O2P are implemented in the Open MPI library. Our results show the following: (i) distributed event logging improves message logging protocol scalability and (ii) using O2P with a distributed event logger provides an efficient and scalable fault-tolerant solution for message-passing applications.
Document type :
Journal articles
Complete list of metadata
Contributor : Christine Morin Connect in order to contact the contributor
Submitted on : Monday, September 3, 2012 - 4:58:31 PM
Last modification on : Thursday, January 20, 2022 - 4:20:00 PM

Links full text



Thomas Ropars, Christine Morin. Active optimistic and distributed message logging for message-passing applications. Concurrency and Computation: Practice and Experience, Wiley, 2011, 23 (17), pp.2167-2178. ⟨10.1002/cpe.1775⟩. ⟨hal-00727470⟩



Les métriques sont temporairement indisponibles