Active optimistic and distributed message logging for message-passing applications - Archive ouverte HAL Access content directly
Journal Articles Concurrency and Computation: Practice and Experience Year : 2011

Active optimistic and distributed message logging for message-passing applications

(1) , (1)
1

Abstract

Message logging is an attractive solution to provide fault tolerance for message-passing applications because it is more scalable than coordinated checkpointing. Sender-based message logging is a well-known optimization that allows the saving of message payload in the sender memory. Thus, only message reception events have to be logged reliably by using an event logger. This paper proposes solutions to further improve message logging protocol scalability. In existing works on message logging, the event logger has always been considered as a centralized process. We propose a distributed event logger that takes advantage of multi-core processors that are to be executed in parallel with application processes, leveraging the volatile memory of the nodes to save events reliably. We also propose the combination of our distributed event logger and O2P, an active optimistic message logging protocol using a gossip-based protocol to disseminate information on new stable events. Our distributed event logger and O2P are implemented in the Open MPI library. Our results show the following: (i) distributed event logging improves message logging protocol scalability and (ii) using O2P with a distributed event logger provides an efficient and scalable fault-tolerant solution for message-passing applications.

Dates and versions

hal-00727470 , version 1 (03-09-2012)

Identifiers

Cite

Thomas Ropars, Christine Morin. Active optimistic and distributed message logging for message-passing applications. Concurrency and Computation: Practice and Experience, 2011, 23 (17), pp.2167-2178. ⟨10.1002/cpe.1775⟩. ⟨hal-00727470⟩
171 View
1 Download

Altmetric

Share

Gmail Facebook Twitter LinkedIn More