Differentiated consistency for worldwide gossips

—Eventual consistency is a consistency model that favors liveness over safety. It is often used in large-scale distributed systems where models ensuring a stronger safety incur performance that are too low to be deemed practical. Eventual consistency tends to be uniformly applied within a system, but we argue a demand exists for differentiated eventual consistency, e.g. in blockchain systems. We propose UPS to address this demand. UPS is a novel consistency mechanism that works in pair with our novel two-phase epidemic broadcast protocol GPS to offer differentiated eventual consistency and delivery speed. We propose two complementary analyses of the broadcast protocol: a continuous analysis and a discrete analysis based on compartmental models used in epidemiology. Additionally, we propose the formal definition of a scalable consistency metric to measure the consistency trade-off at runtime. We evaluate UPS in two simulated worldwide settings: a one-million-node network and a network emulating that of the Ethereum blockchain. In both settings, UPS reduces inconsistencies experienced by a majority of the nodes and reduces the average message latency for the remaining nodes.


I N T R O D U C T I O N
Today's distributed computer systems have reached unprecedented sizes.Modern data centers routinely comprise tens of thousands of machines [2], [3]; social networks and IoT services combine cloud and edge resources into worldspanning applications [4], [5]; blockchains [6]- [8] execute on tens of thousands of machines organized in peer-to-peer networks [9], [10].Such enormous scales come with a host of challenges that distributed-system researchers have focused on over the last decades.One key challenge arises from the inherent tension between performance, consistency, and fault tolerance, as proven by the CAP impossibility theorem [11].
To overcome this fundamental barrier, large-scale distributed systems often adopt an eventual consistency model [5], [12] for their data [13], [14].Intuitively, eventual consistency allows the replicas of a distributed shared object to temporarily diverge, as long as they eventually converge back to a unique global state.Formally, this global consistent state should be reached once updates on the object stop (with additional constraints usually linking the object's final value to its sequential specification) [15].In a practical system, a consistent state should be reached every time updates stop for long enough [5].How long is long enough depends on the properties of the underlying communication service, notably on its latency and ordering guarantees.These two fundamental properties stand in a natural trade-off, where latency can be traded for better (probabilistic) ordering properties [16]- [18].This inherent tension builds a picture in which an eventually consistent object must find a compromise between speed • D. Frey and F. Taïani are with Univ Rennes, CNRS, Inria, IRISA, Rennes F-35000, France.E-mails: davide.frey@inria.fr,francois.taiani@irisa.fr.• A. Mostefaoui and M. Perrin are with LS2N, Université de Nantes, Nantes F-44300, France.E-mails: achour.mostefaoui@univ-nantes.fr, matthieu.perrin@univ-nantes.fr.• P.-L.Roman is with École Polytechnique Fédérale de Lausanne (EPFL), Lausanne CH-1015, Switzerland.E-mail: pierre-louis.roman@epfl.ch.
A preliminary version of this paper appears in the proceedings of SRDS 2016 [1].Please refer to the Appendix in page 15 for a summary of the changes.
(how fast changes are visible to other nodes) and consistency (to which extent different nodes agree on the system's state).This tension is exemplified in Fig. 1 that shows processes p 1 and p 2 using a distributed append-only queue q, i.e. an abstract representation of a blockchain [19, §16.7].A queue q supports two operations: append(x) adds the integer x to the queue, and read() returns the content of q.In Fig. 1, p 1 and p 2 eventually converge to the same consistent global state (1, 2) that includes all modifications: q.append(1) by p 1 and q.append(2) by p 2 , in this order.However, p 2 experiences a temporary inconsistent state when it reads (2): this read "misses" p 1 's append operation which has been ordered before that of p 2 and is thus inconsistent with the final state (1,2).Process p 2 could delay its first read operation to increase its chances of receiving p 1 's append operation on time (dashed circle), ultimately avoiding this inconsistency.However, such delays naturally impede change propagation across processes and are hard to set correctly.
This tension between speed and consistency in systems relying on eventual consistency is generally resolved by uniformly selecting one trade-off point that all nodes in a system adhere to [5], [16].However, large-scale systems routinely need to cater to diverging requirements, e.g.due to hardware heterogeneity or applications offering heterogeneous services.For instance, blockchain systems are composed of miners and clients with different priorities.Miners are incentivized to receive data as fast as possible [20]- [22] while clients mainly seek security, i.e. consistency [23], [24].
We argue in this paper for differentiated levels of consistency to satisfy the heterogeneity inherent to large-scale systems.Specifically, we focus on the implementation of eventually consistent replicated data structures in large message- passing distributed systems.We investigate how different levels of consistency can be provided in this context using a novel, judiciously crafted, epidemic broadcast protocol.Unlike past hybrid consistency conditions [25]- [29], our solution differentiates eventual consistency in and of itself to favor speed for specific nodes (e.g.blockchain miners) and consistency for others (e.g.blockchain clients).It leverages the internal dynamics of probabilistic gossip/epidemic protocols [30], [31] whose scalability has been put to use in a broad range of recent systems [8], [32]- [34].
Evaluating such a protocol in practice raises, however, an important methodological point: how to measure consistency.Consistency conditions are typically defined as predicates on execution histories; a system execution is either consistent or it is not, leaving no room for quantification.Practitioners who deploy eventual consistency are interested in the current level of consistency of a running system, i.e. how far the system currently is from a consistent situation.Quantitatively measuring the inconsistencies in a system is not straightforward.We can measure the level of agreement between nodes by counting how many nodes see the same state [5], but this approach can thus lead to paradoxes.For instance, the system in Fig. 1 appears close to agreement if most nodes read the same state (2) as p 2 , even though (2) is inconsistent with the final converged state (1,2).
This paper makes the following contributions: • Update-query consistency with Primaries and Secondaries (UPS), a novel consistency mechanism offering two levels of eventual consistency in a system ( §3).UPS extends the update-query consistency protocol [15] by exploiting a novel two-phase epidemic broadcast-Gossip Primary-Secondary (GPS)-that involves two classes of nodes.Primary nodes (Primaries in short) seek to receive object updates as fast as possible while Secondary nodes (Secondaries in short) strive to minimize the inconsistencies they perceive.Fig. 2 summarizes these goals.• A formal analysis of GPS' latency and network overhead via a continuous analysis as well as of its consistency via a compartment-based discrete analysis ( §4).• A scalable consistency metric to assess the inconsistency observed by Primaries and Secondaries using UPS ( §5).• Experimental evaluation of the consistency and latency properties of UPS both in a large-scale simulated onemillion-node network and in a network simulating the Ethereum blockchain [7] ( §6).We show in both settings that the cost paid by each node class is small compared to an undifferentiated system: Primaries experience a lower latency with levels of inconsistency close to those of undifferentiated nodes, while Secondaries observe fewer inconsistencies at a minimal latency cost.Besides, GPS only incurs a small overhead in message complexity, proportional to the ratio of Primaries in the network.

Linearizability
Linearizability [35] ensures that a replicated data structure behaves in a way that is indistinguishable from that of a single shared copy.It typically imposes a total order on the operations applied to the data structure and is therefore considered one of the strongest available consistency criteria.Linearizability provides a convenient abstraction to developers but comes with great costs in large-scale systems.Failures are common in these systems and communication delays are typically unbounded, i.e. the network is said to be asynchronous.These two constraints make it impossible to emulate any linearizable data structure without abandoning fundamental properties such as availability or partition tolerance, a trilemma elegantly captured by the CAP theorem [11].Worse, Attiya and Welch have proved [36] that even in failure-free synchronous systems, the implementation of strongly consistent data structures (e.g.stack, queue, set) implies execution times for local operations that are linear with respect to network latency, resulting in poor performance.

Weak Consistency Conditions
Weaker consistency conditions overcome the limits and costs of linearizability by striking a balance between agreement, speed, and dynamicity within a system.Such conditions include PRAM consistency, causal consistency [37]- [39], and eventual consistency [12].PRAM and causal consistency expect the local histories observed by each process to be plausible, regardless of the other processes, but they do not impose state convergence.On the other hand, eventual consistency tolerates temporary inconsistent states but requires that all replicas eventually converge to the same state.
Unlike linearizability, these weaker consistency conditions can be implemented in any distributed system where communication is possible, even when facing unbounded delays (asynchrony) and node failures.In addition, their local time complexity (e.g. the time taken by one given node to execute an operation locally) does not depend on communication delays, allowing for efficient implementations.

Implementing Consistency: the Role of Broadcast
In message-passing distributed systems, a replicated data structure is typically implemented by (1) hosting replicas of the shared data on each process, and by (2) leveraging a broadcast primitive to disseminate the operations that processes execute on their copy of the shared data structure.The consistency level exhibited by the resulting data structure is tightly related to the guarantees of the underlying broadcast primitive they use.Linearizability typically requires a totalorder broadcast, which is computationally equivalent to consensus [40], [41].By contrast, weaker consistency conditions can be implemented using weaker broadcast abstractions.Causal broadcast can be used to implement a causal memory and reliable broadcast is the communication primitive indicated to implement eventual consistency [42].

Update Consistency
As we hinted at in § 1, eventual consistency requires replicated objects to converge to a globally consistent state when update operations stop for long enough.On its own, this condition turns out to be too weak for most practical purposes as the convergence state does not depend on the operations performed on the object.For this reason, actual implementations of eventually consistent objects usually refine eventual consistency by linking the convergence state to its sequential specification.
In this paper, we focus on one such refinement: Update consistency [15].Consider the append-only queue object of Fig. 1; its sequential specification consists of two operations: • append(x), appends the value x at the end of the queue; • read(), returns the sequence of all the elements ever appended, in their append order.Update consistency requires the final converged state to result from a total ordering of all the append operations of all agents.This ordering must also respect the order in which each agent issues its append operations (a.k.a. the process order).For example, the scenario in Fig. 1 satisfies update consistency since the final convergence state results from the ordering ⟨q.append(1), q.append(2)⟩.An equivalent definition [15] states that an execution history respects update consistency if it contains an infinite number of updates or if a finite number of reads can be removed from it such that the resulting pruned history is sequentially consistent.In Fig. 1, this is achieved by removing the read() that returns 2.
Alg. 1 shows an algorithm from [15] that implements the update-consistent append-only queue of Fig. 1.Alg. 1 pairs a broadcast primitive (cf.Line 8) with Lamport clocks [43] to reconstruct a total order of operations after the fact [15].Relying on this after-the-fact total order allows update consistency to support non-commutative operations such as append without requiring type-constrained CRDTs [13].

Problem Statement
The critical feature of update consistency lies in its ability to precisely define the nature of the converged state the system should reach once all updates have been issued.In practice, however, the intermediate behavior of a system before it converges (i.e.before it reaches "consensus") is equally critical for its usability [5].Focusing on this intermediate behavior raises two key challenges.First, existing eventuallyconsistent systems tend to treat all nodes uniformly, leading to a non-converged behavior that does not cater for the specific needs of individual participants [14], [44].Second, studying a system's behavior before convergence requires the ability to quantify how well-or ill-converged this system is.Unfortunately, measuring the inconsistency level of nonconverged states remains an ill-defined exercise.Despite being easy to compute in a broad range of situations, existing system-oriented metrics do not consider the ordering of update operations (append in our case) [5], [45]- [47].On the other hand, theoretical metrics require global system knowledge [48], which makes them impractical at large scale.
This paper addresses both challenges.First, we propose a novel scalable broadcast protocol that, when used in Line 8 of Alg. 1, satisfies update consistency and supports different consistency levels for read operations that occur before Algorithm 1: Update consistency for an append-only queue.for all < clock, id, v >∈ U sorted by (clock, id) do 15: return q Fig. 3: Two sorts of speeds: latency (λ) and jitter (δ).
convergence. 1 Specifically, we exploit the trade-off between delivery speed and consistency to offer different levels of inconsistency within the same system: We distinguish a small fraction of Primary nodes that should receive fast, albeit possibly inconsistent, information, from a significant fraction of Secondary nodes that should only receive stable consistent information, albeit more slowly.Second, we propose a novel metric to measure the level of inconsistency of an appendonly queue, and use it to evaluate our protocol.

System Model
We consider a large set of nodes p 1 , ..., p N that communicate using point-to-point messages.Any node can communicate with any other node, given its identifier.We use probabilistic protocols that are naturally robust to crashes and message losses, but do not consider these aspects in the rest of the paper for simplicity.Nodes are categorized in two classes: a small number of Primary nodes (Primaries) and a large number of Secondary nodes (Secondaries).The class of a node is a parameter set by the application that captures the requirements in terms of update consistency: Primaries should perceive updates as fast as possible, while Secondaries should experience as few inconsistencies as possible.
1. Importantly, "consistency level" here does not refer to the time required to achieve (eventual) agreement, but on the transient level of disagreement before agreement is achieved.This transient level of disagreement has a direct impact of the quality of service and trust users can put in a system, and is therefore of prime practical importance [5].

Intuition & Overview
We have repeatedly referred to the inherent trade-off between speed and consistency in eventually consistent systems.This trade-off may appear contradictory: if Primaries receive updates faster, why should not they also experience higher levels of consistency?This apparent paradox arises because we have so far silently confused speed and latency.The situation within a broadcast is in fact more subtle and involves two sorts of speeds (Fig. 3): latency (λ, shown in the figure as the mean delay t m − t 0 over all nodes, where t m is the mean reception time) is the time a message m takes to reach individual nodes, from the point in time of m's sending (t 0 ).Jitter (δ), by contrast, is the delay between the first (t 1 ) and the last receipt (t f ) of a broadcast. 2Inconsistencies typically arise in Alg. 1 when some updates have only partially propagated within a system, and are thus predominantly governed by the jitter δ rather than the average latency λ.
The gossip-based broadcast protocol we propose, Gossip Primary-Secondary (GPS), exploits this distinction and proposes different δ/λ trade-offs.Primaries have a reduced λ, thus accelerating update receptions, but also a slightly increased δ, while Secondaries have a reduced δ, thus increasing consistency by improving the order of update receptions, at the cost of a slightly higher λ.
Intuitively, GPS uses the set of Primaries as a sort of message "concentrator" that accumulates copies of an update u before collectively forwarding it to Secondaries in a better order.Fig. 4 depicts the main phases of GPS: (1) A new update u is first sent to Primaries; (2) Primaries disseminate u among themselves; (3) Once most Primaries have received u, they forward it to Secondaries; (4) Finally, Secondaries disseminate u among themselves.
A key difficulty in this sequence consists in deciding when to switch from Phase 2 to 3. A coordinated transition would require a global synchronization mechanism, which is costly and generally impracticable in very large systems.Instead, GPS relies on a less accurate but scalable local procedure based on broadcast counts: each Primary node decides locally when to start forwarding to Secondary nodes.
For completeness' sake, let us note that since Phase 1 exclusively relies on Primaries, GPS stops working if all Primaries fail.In cases when such a systemic catastrophic failure is plausible, GPS requires additional fault-tolerance mechanisms.These mechanisms are however deemed out of scope of the current paper.
2. In most large-scale epidemic broadcast scenarios, t 0 − t 1 corresponds to a single communication exchange and is therefore small in comparison to λ, which is driven by the average path length in the random gossiping graph.As a result, λ and δ tend to capture the same underlying dynamics, and δ is ignored.
Algorithm 2: Gossip Primary-Secondary (GPS) for a node.for all j ∈ {fanout random nodes in targets} do 16: send(msg) to j 17: upon receive (msg) do 18: Algorithm 3: Uniform gossip for a node (baseline).

The GPS Algorithm
Alg. 2 shows the pseudo-code of GPS.GPS follows the standard models of reactive epidemic broadcast protocols [49], [50].Each node keeps a history of the messages received so far (in the R variable, Line 8), and decides whether to retransmit a received broadcast to fanout other nodes based on its history.However, contrary to a standard epidemic broadcast, GPS handles Primaries and Secondaries differently: • First, GPS uses two instances of random peer sampling (RPS) protocols [51], [52] (Lines ).Primaries use this count to detect duplicates, and forward a message to fanout Secondaries (Line 23) when a duplicate is received for the first time, thus triggering Phase 3. In short, we can classify Primaries as infect twice and die and Secondaries as infect and die.For comparison, a standard infect and die gossip without classes is shown in Alg.3; we refer to it as Uniform gossip.Uniform gossip serves as our baseline for our analysis ( §4) and our experiments ( §6).

F O R M A L A N A LY S I S O F G P S
In the following we analyze the expected performance of GPS and compare it to Uniform gossip in terms of message complexity and latency in § 4.1, then analyze the behavior of nodes in GPS using a compartmental model in §4.2.Our analysis uses the following notations:

Asymptotic Continuous Analysis
We proposed an asymptotic analysis of GPS in a preliminary version of this paper [1, §III.D].This analysis shows that Primaries receive messages rounds earlier of nodes in Uniform gossip and that, assuming a small density d, GPS induces small overheads both in message latency for Secondaries and in message complexity.We summarize here the main findings to compare them with our evaluation results further.
The latency of Primaries is log f (d) rounds smaller compared to nodes in Uniform gossip: Assuming a small fraction of Primaries (i.e.d ≪ 1), the latency of Secondaries can be approximated as with C a constant independent of N .Since Primary nodes gossip twice (once towards other Primary nodes at Line 22 of Alg. 2, and once towards Secondary nodes at Line 23), and Secondary nodes gossip once (Line 25), the message complexity of GPS can be approximated as Eq. ( 3) shows that GPS induces a fraction d of additional messages compared to Uniform gossip, e.g. a network with 1% Primaries induces only 1% more messages.

Compartment-based Discrete Analysis
We next analyze the dissemination of messages in GPS using a discrete-time compartmental model from epidemiology.Compartmental models split the population into compartments (or states) and analyze population flows between compartments based on transition rates.The simple three-compartment model Susceptible-Infectious-Recovered (SIR) [53] can be used to analyze Uniform gossip: at first every node is susceptible, then becomes infectious upon the first reception of a message m, gossips m as a result, and recovers (or dies) by ignoring m further.The SIR model defines the infection rate β and recovery rate γ such that the population follows the sequence S → β I → γ R. GPS distinguishes two populations-Primaries and Secondaries-and thus cannot be analyzed using a straightforward SIR model.Since Primaries first gossip to Primaries then to Secondaries, intuitively, a fitting model should contain two infectious compartments, e.g.SISIR, while Secondaries follow the classic SIR sequence.
Model.We assume a synchronous execution, in which messages sent in round r are received in round r + 1.Each round r unfolds in two phases: first, messages from round r − 1 are received, then messages from round r are sent.
We reuse some of the notation from SIR, e.g.β and γ, but introduce our own notation for compartments for readability.We identify the compartments based on node class and the number of messages sent or received by nodes.We will note the compartment of nodes of class X (with X ∈ {P, S}) that have received i copies of a message and sent j messages.For Primaries i ∈ {0, 1, 2+}, while for Secondaries i ∈ {0, 1+}, where y+ means y or more; and j ∈ {0, 1, 2}, where j = 2 is only used for Primaries.
Fig. 5 represents all compartments as boxes, and events, i.e. the reception or sending of one or multiple messages, as arrows between the compartments.For instance, the dashed arrow from P 0,0 to P 2+,0 corresponds to the situation in which a susceptible Primary receives 2 or more messages in the same round.Similarly, the solid arrow from P 2+,0 to P 2+,2 corresponds to the sending of two messages (to f other Primaries and f Secondaries).The number of messages sent or received by an arrow is derived from the compartments connected by this arrow, e.g. 2 sendings in the latter example.Nodes in blue compartments send messages in the second phase of a round hence change compartment.

Primary nodes
By construction, a Primary node never sends more than 2 messages, and never more than the number of messages it has seen so far, i.e. i ≥ j in the above notation.As a result, the compartments of interest for Primaries are P 0,0 , P 1,0 , P 2+,0 , P 1,1 , P 2+,1 , and P 2+,2 .
Originally all Primaries are in the P 0,0 compartment since they have not received any message, which mirrors the susceptible compartment of SIR.Symmetrically, nodes are in P 2+,2 when they have received 2 or more messages, and sent 2 (the first to other Primaries, and the second to Secondaries).These nodes will not send any more messages (will not "infect" any other node), and P 2+,2 can therefore be interpreted as the compartment of recovered nodes of SIR.
For the compartments P 0,0 , P 1,1 and P 2+,2 (filled in white in Fig. 5a), we will note P r i,j for j = min(2, i), the number of nodes in P i,j at the start of round r.
The remaining compartments require a somewhat different treatment.This is because we have assumed each round  unfolds in two phases: first, messages from round r − 1 are received, then messages from round r are sent.Because the sending of messages is deterministic (e.g.all nodes in compartment P 1,0 send a message in the second phase of a round), the nodes present in the three compartments P 1,0 , P 2+,0 and P 2+,1 (filled with light blue in Fig. 5a) always send messages in the second phase of a round, and therefore change compartments.As a result, P 1,0 , P 2+,0 and P 2+,1 are systematically empty between two rounds.By convention, for these transitory compartments, we will note the number of nodes in compartment P i,j just after the first phase of round r, and before the sending of any message.We are interested in the evolution of P r 0,0 , P r 1,1 and P r 2+,2 in each round.Because the number of Primaries, |P |, remain constant we can derive P r 2+,2 from P r 0,0 and P r 1,1 using In the following we will therefore focus on P r 0,0 and P r 1,1 .Theorem 1.The evolution of Primary nodes follows the formulae P r+1 0,0 = a(P r 0,0 , P r−1 0,0 ), P r+1 1,1 = b(P r 0,0 , P r−1 0,0 , P r 1,1 ), where Proof.Please refer to the Appendix for the proof.

Secondary nodes
The compartments of Secondaries (Fig. 5b) are simpler, as Secondaries can only be in two states at the end of a round with respect to a message m: either they have not received m and not forwarded it (S 0,0 ), or they have received m once or more, and forwarded it once (S 1+,1 ).As for Primaries, S 1+,0 (Secondaries that have received m, but not yet forwarded it) is a temporary compartment, used for clarity, that is filled and emptied in the course of a round.

Numerical application
Figs. 6 to 8 use Thms. 1 and 2 to chart the behavior of GPS under various parameters.In particular, Figs. 6 and 7 confirm the asymptotic results of §4.1: a low density causes Primaries to receive a broadcast earlier, but has little impact on the dissemination to Secondaries, with the gain by Primaries of the form −log f (d) (Eq.( 1)).Fig. 8 confirms our intuition regarding the impact of density on the jitter/latency trade-off between Primaries and Secondaries: although a high Primary density leads to reduced latency gains for Primaries (x-axis, 'P' curve, d = 0.1 annotation), it also yields a reduced jitter for Secondaries (y-axis, 'S' curve, d = 0.1).In §6, we will see that this reduced jitter delivers an improved consistency for Secondaries, matching the intuition presented in Fig. 2.

C O N S I S T E N C Y M E T R I C
To assess more precisely how the UPS/GPS algorithm we are proposing ( §3.3) can improve the intermediate behavior of an update consistency data structure, we need to quantify how consistent this data structure is.As such, we propose in this section a novel consistency metric that overcomes the weaknesses of existing solutions by (i) considering the ordering of operations, and (ii) being locally computable.

A General Consistency Metric
We first observe that the algorithm for an update-consistent append-only queue (cf.Alg. 1) guarantees that all its execution histories respect update consistency.To measure the consistency level of temporary states, we therefore evaluate how the history deviates from a stronger consistency model, sequential consistency [54].An execution respects sequential consistency if it is equivalent to some sequential (i.e.totally ordered) execution that contains the same operations, and respects the sequential (process) order of each node.
Since update consistency relies itself on a total order, the gist of our metric consists in counting the number of read operations that do not conform with a total order of updates that leads to the final convergence state.Given one such total order, we may transform the execution into one that conforms with it by removing some read operations.In general, a data object may reach a given final convergence state through different possible total orders, and for each such total order we may have different sets of read operations whose removal makes the execution sequentially consistent.We thus count the level of inconsistency by taking the minimum over these two degrees of freedom: choice of the total order, and choice of the set.
More formally, we define a set of temporary inconsistencies (or temporary inconsistency set for short) w.r.t. an execution Ex as a finite set of read operations E that, when removed from Ex , makes it sequentially consistent. 3We denote the set of all the temporary inconsistency sets of an execution Ex over all total orders by TI (Ex ).We then define the relative inconsistency RI of an execution Ex as the minimal number of read operations that must be removed from Ex to make it sequentially consistent: For example, in Fig. 1, several total orders (of all operations) can lead to a sequentially consistent execution, once pruned of problematic read operations.If we consider the total order ⟨ q.append(1), m.append(2), q.read(1), q.read(2), q.read(1, 2), q.read(1, 2) ⟩ then the execution becomes se- quentially consistent by removing q.read(1) and q.read(2).By contrast, with ⟨ q.append(1), q.read(1), q.append(2), q.read(2), q.read(1, 2), q.read(1, 2) ⟩, removing just q.read(2) suffices to make the execution sequentially consistent.As the execution itself is not sequentially consistent, the level of inconsistency in Fig. 1 is therefore 1.
The metric RI is particularly adapted to compare the consistency level of implementations of update consistency: the lower, the more consistent.In the best-case scenario where Ex is sequentially consistent, TI (Ex ) contains the empty set, resulting in RI (Ex ) = 0.In the worst case scenario where the execution never converges (i.e.some nodes indefinitely read incompatible local states), every set of reads that needs to be removed to obtain a sequentially consistent execution is infinite.Since TI only contains finite sets of reads, TI (Ex ) = ∅ and RI (Ex ) = +∞.

Update-consistent Append-only Queue
In general, RI (Ex ) is complex to compute, in contrast to more practice-oriented proposals [5], [45]- [47]: one must consider all possible total orders of events that can fit sequential consistency, and all possible finite sets of reads to identify temporary inconsistency sets.But for an append-only queue implemented with Alg. 1, we can easily show there exists exactly one minimal set of temporary inconsistencies for executions in which a convergence state is reached.
To understand why, we first observe that the append operation is non-commutative.This implies that there exists a single total order of append operations that yields a given final convergence state.Second, Alg. 1 guarantees that the size of the successive sequences read by a node can only increase and that read operations always reflect the writes made on the same node.Consequently, in order to have a sequentially consistent execution, it is necessary and sufficient to remove all the read operations that return a sequence that is not a prefix of the sequence read after convergence.These read operations constitute the minimal set of temporary inconsistencies TI min .

E V A L U AT I O N
In this section, we evaluate GPS and UPS in order to answer the following three research questions (RQs): RQ1: How does UPS compare against a uniform update consistency in terms of consistency ( § 6.4), message latency ( § 6.5), and message complexity ( § 6.6) when experimented on a one-million-node network?RQ2: How do the experimental results for RQ1 fare against the expected results from the continuous and compartment-based analyses?( §6.4- §6.6) RQ3: How does GPS perform when applied to a practical usecase, namely a blockchain network?( §6.7)This evaluation explores two complementary large-scale setups.We opt for an extreme system size (1,000,000 nodes) in RQ1 to analyze the behavior of UPS and GPS well beyond the size of most existing P2P systems.In doing so, we aim to inform the design of P2P systems at scales that do not exist yet, but might appear in the future.By comparison, in RQ3, we study the impact of GPS in a concrete present-day P2P system (of ≈23,000 nodes) using recent public datasets.
We first detail the evaluation methodology in §6.1- §6.2 and summarize the results for RQ1 in §6.3.

Methodology Common to RQ1 & RQ3
We evaluate UPS and GPS using PeerSim [55], a well-known simulator for large-scale peer-to-peer networks.Both code and results are available online for reproducibility [56].

RPS implementation.
Both GPS and Uniform gossip require all nodes to execute RPS protocols to fill their respective network views (e.g.Line 10 and Line 11 of Alg. 2), which serve as peer pools to select gossip targets (Line 14 of Alg. 2).
We implement the two RPSs for GPS (i.e. one per node class) and the one for Uniform gossip as global oracles that fill the views of every node with other nodes sampled randomly from the entire network.Typical RPS implementations [57], [58] differ in that they fill one node's view by only sampling its nearest nodes in the topology (e.g.neighbors of neighbors) for practical reasons, thus taking longer to produce a randomness comparable to that of an oracle.
Scenario.We consider a scenario where all nodes share an instance of an update-consistent append-only queue, as defined in § 2.4.Following the definition of update consistency, nodes eventually converge into a strongly consistent state once they stop modifying the queue.
We opt for a scenario where 10 append(x) update operations, with x ∈ N, are performed on the queue by 10 random nodes with one update per round during the first 10 rounds.Local clocks start at 0 and are incremented every round.Each append is timestamped with the round number of its emission (cf.Line 7 of Alg. 1).All nodes also execute a read operation every round on their local copy of the queue.
We expect the system to experience two periods: (1) a temporary phase where updates are issued and disseminated, emulating a system continuously performing updates, then (2) a stabilized state once updates finished propagating and most nodes have converged to a strongly consistent state.
Consistency metric.The 10 append operations of our scenario, that we note (append i ) i∈[0..9] for simplicity, are eventually ordered according to the round in which they were sent (round i for operation append i ).As a result a read operation by a node q at round r is inconsistent in case the set of all append operations received by q so far do not form a prefix of the full (append i ) i∈[0..9] sequence.
We refine the generic consistency metric proposed in §5 to suit our experimental setting.As such, since the number of updates is finite and each node reads the state of the queue at each round, we define a per-round metric to compare the evolution of the inconsistency of Primaries and Secondaries through time.For each round r, we define R P (r) and R S (r) as the sets of all the read operations performed at round r by Primaries and Secondaries, respectively.Using R P (r) and R S (r), we define the per-round inconsistency metrics Incons P (r) and Incons S (r) as the ratio of Primaries and Secondaries that observe an inconsistent read at round r (i.e. the corresponding reads are in TI min ): The metric for all nodes Incons P +S (r) naturally follows: In the following, we focus on these instantaneous perround metrics which provide an equivalent yet more accurate view than the relative inconsistency of an experimental run, RI (Ex ).In fact, RI (Ex ) can be computed as follows: Statistical significance and plots.We evaluate each configuration 25 times and record the resulting distribution.

Methodology Specific to RQ1
We target a world-scale network to evaluate RQ1.
Network parameters.We use a network of 1,000,000 nodes, a fanout of 10 and an RPS view size of 100.These parameters yield a high broadcast reliability (i.e. the probability that a node receives a message) (cf.§ 6.6).Reliability may be improved by increasing the fanout [49], however we choose powers of 10 for these parameters for readability.
Configurations.We use three configurations for GPS, one per density d of Primaries: 000.1, 00.1, 0.1.In addition, we also evaluate Uniform gossip (cf.Alg. 3) as a baseline.

RQ1 -Summary of World-scale Experiments
Fig. 9 mirrors Fig. 2 presented in § 1. Fig. 9 overviews the experimental results depicting the trade-offs between consistency and latency for the different sets of nodes involved in our scenario.UPS configurations are each shown with two points: Primaries are shown using solid symbols, Secondaries using hollow ones.Uniform gossip is represented by a single black cross.The position on the x-axis shows the average update latency experienced by each set of nodes, and the y-axis their worst perceived level of inconsistency, i.e. the maximum Incons X (r) value over all runs.
Fig. 9 shows UPS delivers the differentiated consistency/latency trade-offs we set out to achieve in § 1: Secondaries enjoy higher consistency levels than they would in a uniform update-query consistency protocol, while paying only a small cost in terms of latency.The consistency boost strongly depends on the density of Primaries in the network, as we further show in §6.4,while the cost in latency does not, reflecting our analysis in §4.1.Primaries present the opposite behavior, i.e.Primaries' latency gains evolve in the reverse direction of Secondaries' consistency gains.
In short, we note a clear improvement of the maximum inconsistency level of Secondaries over the baseline and an Fig. 10: Ratio of inconsistent nodes among Primaries (Fig. 10a) and Secondaries (Fig. 10b) using UPS for densities d ∈ {000.1, 00.1, 0.1} and a baseline using Uniform gossip (lower is better).Primaries reach a strongly consistent state faster than nodes in the baseline, with the same fraction of inconsistent nodes.The set of Secondaries is more consistent than the baseline; the higher the density, the more consistent.
equivalent but more volatile inconsistency level for Primaries over the baseline.In addition, Fig. 10b clearly shows the impact of the density on the consistency of Secondaries.

Detailed results.
As expected, we observe an increase of inconsistent nodes during the temporary phase for all configurations and a return to a network-wide consistent state once every node has received every update.During the temporary phase, the inconsistency level of Primaries is equivalent to that of nodes in Uniform gossip, i.e. ≈ 4.6% of Primaries are inconsistent with a higher variability for lower densities.In that phase, the inconsistency level of Secondaries is much lower than that of Primaries; we observe that the higher the density the better, i.e. the number of inconsistent nodes remains under 1.0% with a density d = 0.1 but reaches ≈ 4.0% with a density d = 0.001.
The jitter (cf.§3.2) can be used to compare the consistency of different sets of nodes; a lower jitter results in a lower probability for a node to receive updates in the wrong order, i.e. a higher probability for a node to be consistent.Fig. 11 depicts the latency standard deviation, as a measure of jitter, experienced by Primaries and Secondaries using UPS and all nodes using Uniform gossip.While Primaries experience similar jitter regardless of their density (0.656 for d = 0.001, 0.665 for d = 0.01, 0.666 for d = 0.1)-which is on par with nodes using Uniform gossip (0.667)-Secondaries' jitter decreases visibly as the density increases.These results confirm the causality between jitter and consistency.
Once enough Primaries are infected, the dissemination rapidly reaches all Secondaries.For instance, most Secondaries can receive an update at the same time from Primaries (via Line 23 from Alg. 2) if the density is high enough.Since Secondaries receive all messages in fewer rounds than Primaries, Secondaries are more consistent as a result.
Comparison with analysis.We use Thms. 1 and 2 (cf.§4.2), we derive closed-form formulas for the predicted inconsistency levels of both classes of nodes in our evaluation.
Following the definition of our consistency metric in §6.1, we note Received r q the set of append operations received by q at the start of round r, and p r i the probability that operation  append i has been received by q at the start of r.We have (p r i = 0 when r ≤ i, since append i is broadcast during round r.)Then, the probability that q has exactly received a prefix of length ℓ when round r starts is (15) As a result, the probability of a node performing an inconsistent read at round r is given by where read r q is the read operation by process q at round r, X ∈ {P, S} denotes the class of node q, and TI min and R X (r) are the quantities defined in §6.1.
In Eq. ( 16), p r i can be derived from Thms. 1 and 2 using where X ∈ {P, S} stands for the class of node q.Fig. 12 plots Eq. ( 16) in which Eq. ( 17) has been injected for Primaries and Secondaries (solid colored lines) for a density d = 0.1, and overlays these theoretical predictions with the experimental results from Fig. 10 (dashed grayed lines).It shows theoretical results follow closely that of experiments up to a small experimental error (with an absolute discrepancy capped at 0.41% for Primaries in round 5).
As for the jitter, Fig. 11 accurately mirrors with experimental data a subset of the theoretical findings of Fig. 8. Since Fig. 11 depicts only three densities, the oscillating behavior of Primaries' jitter is not apparent.Secondaries receive all updates up to 0.5 round later with GPS than with Uniform gossip, independently of the density.As a result, Primaries and Secondaries experience similar latency gains and losses, respectively, regarding their convergence to a consistent state, as shown in Fig. 10.

Comparison with analysis.
The experimental results for message latencies confirm our analysis.Fig. 13, that depicts experimental results, closely resembles Fig. 6 obtained from the compartmental analysis in § 4.2.The latency gain for Primaries of 1, 2, and 3 rounds for densities of 10 −1 , 10 −2 and 10 −3 , respectively, corresponds to the expected gains from Eq. ( 1).The small latency penalty of 0.5 round for Secondaries is also in line with Eq. ( 2).Both behaviors confirm the validity of the asymptotic analysis in §4.1.

RQ1 & RQ2 -Network Overhead
Tab. 1 presents three metrics to assess network overheads: (i) the number of messages sent over the network; (ii) the empirical broadcast reliability; and (iii) the normalized network overhead, i.e. the number of messages sent by GPS compared to Uniform gossip.All measures are averaged over 25 experimental runs.The metrics show close to no variability between runs, e.g. the standard deviation ranges between 0.0001% and 0.02% of the mean.The reliability remains high, above four nines, despite the low fanout but barely differs between GPS and Uniform gossip.Considering the experienced reliability, GPS increases network overhead approximately 1.001×, 1.01× and 1.1× compared to Uniform gossip for densities of 0.001, 0.01 and 0.1, respectively.
Comparison with analysis.The overhead in terms of number of messages sent by GPS confirms Eq. (3) in §4.1.

RQ3 -Applicability to Blockchain Networks
We now replace the gossip protocol of a blockchain system with GPS to evaluate its impact on a practical application.

Motivation.
Most existing permissionless blockchains rely on epidemic dissemination mechanisms-to broadcast blocks, transactions, and smart contracts-and on probabilistic consensus mechanisms such as the Nakamoto consensus [6], GHOST [59], or their many derivatives-to determine which blocks constitute the blockchain.These mechanisms fundamentally rely on eventual consistency to converge on the same view of the blockchain [60] and only provide strong consistency with high probability for old blocks in the blockchain (i.e."blockchain's common prefix" [61]).There are no consistency guarantees for recent blocks as they may be replaced in the chain by other blocks-creating so-called blockchain forks-by attackers [62], [63] or even by wellbehaving miners.Clients suffer from these inconsistencies as it directly affects their security [24].Miners also suffer from these concurrent block creations whose frequency is exacerbated by block propagation delays [23].These delays induce lost mining time for them, leading to a reduction of transaction throughput and ultimately profitability [20].
We thus evaluate the use of GPS in the context of a blockchain network where miners (writers) are Primaries and clients (readers) are Secondaries.We focus on the following blockchain-related research questions (BRQs): BRQ1: Can GPS improve the consistency, thus the security, of recent blocks for clients?BRQ2: Can GPS accelerate block propagation between miners, thus reducing lost mining time and ultimately enabling faster block creation?GPS can also be used to disseminate transactions and smart contracts.This could for instance help improve the ordering of transactions for clients, and raise the bar for double-spend attacks in fast payments [64].Research questions tackling this usage of GPS are deemed future work.

Gossiping in blockchains.
We note that the gossip protocols used in networks such as Bitcoin and Ethereum offer little guarantees of (1) connectivity-the network graph may be partitioned, for instance by an attacker-or (2) reliabilitya message may not be fully disseminated thus increasing the likelihood of inconsistencies.This lack of security-related guarantees mostly results from the lack of a proper RPS.Both networks use a protocol similar to Uniform gossip, which is built-in for Bitcoin [65] and libP2P gossipsub for Ethereum [66].Replacing these protocols by Uniform gossip or GPS, therefore, should not degrade the security of these systems.In fact, since both Uniform gossip and GPS assume the existence of an RPS, they should actually improve the connectivity and reliability of the dissemination.
We discuss in § 7 secure RPSs, required in adversarial networks such as blockchains, and avenues to secure GPS.
Target system.We chose to simulate the Ethereum system as it is a large-scale and well-studied system with a diverse set of available datasets documenting its behaviors.Since we cannot always find accurate approximations of all the parameters required to properly simulate the Ethereum network, we sometimes use those of Bitcoin.
As described in § 6.1, we use a scenario where 10 updates (blocks) are created (by miners).We evaluate two configurations, Uniform gossip and GPS, using the following described datasets to obtain realistic results.
Topology.We use 22,982 nodes in total, including 328 miners, as the most recent study on Ethereum shows [10,Tab. 2].While the number of miners may seem low, note that miners may form mining pools.Since miners are Primaries when we evaluate GPS, this setup corresponds to a Primary density of 0.014.We use the associated distribution of miner computation power [10, Fig. 1], i.e. the probability for each miner to create a block, showing 90% of blocks are mined by only 14 entities and 99% by 45 entities.
Nodes are randomly interconnected following the outdegree distribution of the Bitcoin test network [9, Fig. 6a] with a mode of 8, a median of 13, and a maximum of 59.
Note that when we evaluate the impact of using the described realistic topology against an RPS-based topology, we only observe that the latter fully disseminates blocks slightly faster at the price of higher inconsistency variability.
Latencies.Inter-node block propagation latencies follow the heavy-tailed distribution of the Ethereum network [10] with a mode of 95 ms, a median of 123 ms, and a maximum of 4,938 ms.Each pair of nodes is attributed a latency from that distribution and it remains static throughout a simulation.

Block sizes.
According to Etherscan's daily averages [67], the Ethereum block size has steadily increased to reach 43 kB at the end of 2020.Therefore, we assume miners can emit such small blocks instantaneously.Bandwidth has thus no impact on transmission delays in our simulations.

Block creation periods.
We use Etherscan's daily averages from July 2015 to the end of 2020 [68] to derive the block creation periods in our experiments.The distribution exhibits low variability: 10 th percentile of 13.08 s, median of 14.15 s, 90 th percentile of 17.17 s, and maximum of 30.31 s.
Such a low period nullifies the probability of concurrent block creations, and hence forks, in Ethereum.In such a setting, GPS cannot offer a differentiated service.GPS however makes it possible to mitigate the impact of concurrent block creations, hence to use higher block frequencies.To explore this potential benefit, we artificially increase block creation frequencies in the following by either 10, 100 or 200.
Impact of block creation periods.Fig. 14 depicts the ratio of nodes that receive updates (blocks) in the wrong order (akin to Fig. 10), while Fig. 15 depicts the latency needed for nodes to receive all blocks (akin to Fig. 13), for the three acceleration factors, 200, 100, and 10.BRQ1 results.We first observe on Fig. 14 that increasing the frequency of block creation induces an increase in the number of inconsistent nodes, as expected.The ×10 scenario does not show any noticeable difference in inconsistency or latency between GPS and the uniform baseline.In the ×100 and ×200 experiments, Primaries (miners) experience a greater variability of the inconsistency metric compared to   .This gain is hardly noticeable when updates are created slowly (Fig. 15c).
Secondaries (clients) and nodes in Uniform gossip, similarly to Fig. 10.In these last two scenarios, clients are slightly more consistent, hence secured, with GPS than miners and nodes in Uniform gossip.With GPS we see an average of 2.70% inconsistent miners and 2.60% inconsistent clients in Fig. 14a, and 0.35% inconsistent miners and 0.32% inconsistent clients in Fig. 14b.We believe the small difference is due to the very small density of miners of 0.014.
BRQ2 results.We see on Fig. 15 visible latency gains for miners compared to clients and nodes in Uniform gossip in the ×100 and ×200 scenarios.These gains allow miners to spend less time mining on a stale blockchain.Miners actually receive all blocks ≈160 ms faster on average compared to clients and nodes in Uniform gossip in all scenarios since latency gains are tied to the density of miners (cf.Fig. 13).
Results summary.We summarize this evaluation w.r.t. the two BRQs presented initially.For BRQ1, we note clients' security is slightly improved by GPS.As for BRQ2, GPS greatly benefits miners, when blocks are created at a high frequency, by enabling them to converge to a consistent state up to 13% faster (cf.Fig. 15a) than with Uniform gossip.We conclude GPS improves blockchain systems, and its impact greatly depends on the density of miners, 0.014 here.

S E C U R I T Y D I S C U S S I O N
As is, GPS and UPS are not resilient to attackers, or Byzantine nodes [69] in general, but can be extended towards this goal.
Secure RPSs.As hinted at in §6.7, securing GPS requires the use of secure RPSs [52], [70] to establish robust random network topologies.Such topologies maintain strong network connectivity with high probability under adversarial conditions, which is instrumental in ensuring the reliability of a secure gossip algorithm.By contrast, a weak network connectivity can be exploited by attackers to launch eclipse attacks [71] against targeted nodes, which, in the case of blockchains [63], [72]- [75], can lead to miners wasting resources and clients accepting double-spent coins.
More concretely, a secure RPS limits the number of Byzantine nodes in the views it returns.This makes it possible to select a fanout that guarantees that gossip algorithms using this secure RPS are eventually reliable, i.e. they eventually deliver their messages to every node with high probability.This fanout value, logarithmic in the number of correct nodes, can be determined precisely using a known analysis [76] based on the probability of connectivity of random graphs [77].
Attackers could also sabotage the performance of GPS by lying on which nodes are Primary or Secondary.The secure RPSs used by GPS should protect against these attacks.For instance, they can require nodes to sign their membership and ensure that a node cannot belong to both classes.

BRB GPS.
Securing GPS beyond the use of secure RPSs could be achieved by exploiting techniques from Byzantine reliable broadcast (BRB) [19, §9.3].A BRB algorithm guarantees reliability and prevents equivocation [78], i.e. conflicting messages from a node, which can be used in blockchains to double-spend coins.BRB protocols are far more complex and costly than best-effort protocols akin to GPS.A BRB version of GPS would likely (1) use signatures to authenticate message senders, (2) form probabilistic Byzantine quorums using said signatures and samplings from the RPSs, akin to Contagion [34], and (3) be composed of additional communication phases which multiply the message complexity, e.g.Bracha's BRB [79] includes Echo and Ready phases.
However, the Byzantine model of BRB limits its uses.For instance, this model does not fit permissionless blockchains, while GPS' does (cf.§ 6.7).Designing a secure, reliable broadcast for permissionless networks is an open question.
Secure UPS.Once GPS is secured, securing UPS would likely require updates to (1) match application-specific security criteria for validity and/or (2) be signed by its source.For instance, transactions in Bitcoin must be signed while blocks must have a valid proof of work but are not signed.

R E L AT E D W O R K
Differentiated consistency.Hybrid consistency conditions have been extensively studied for distributed shared memory [25], [80], [81] and geo-distributed systems [26]- [29], [82].RedBlue [26] and Fisheye [28] both propose hybrid conditions for geo-replicated systems.RedBlue proposes a consistency/latency trade-off on operations, rather than on nodes like UPS does.While with Fisheye, a strong consistency condition is achieved by topologically-close nodes and a weaker one is achieved by remote nodes.Fisheye does not consider eventual consistency, nor convergence speed.
Measuring inconsistency.Several approaches exist to evaluate the consistency of a system.Zellag and Kemme have proposed [48] to uses a dependency graph of transactions (nodes) to highlight the conflicts between them (edges) in cloud services; cycles in the graph represent inconsistencies.This metric requires a global knowledge of the system which is impractical in large-scale systems.Golab et al. proposed two metrics to measure data staleness against  in key-value store traces: ∆atomicity [45] and Γ [46].However, we cannot use them as they do not consider the ordering of update operations.
Other approaches evaluate consistency practically by using system-based metrics such as the read-after-write latency [47] or the similarity between different cache levels [5].
Finally, measuring inconsistencies becomes unnecessary when using CRDTs [13] since inconsistencies due to operation ordering are impossible with these data types.CRDTs naturally lead to eventual consistency with no extra effort.Biased gossip protocols.Many approaches have explored the use of bias in gossip protocols to accommodate the inherent heterogeneity of systems.Yet, none has tackled heterogeneous consistency requirements before GPS.
For example, Directional gossip [84] improves overall reliability by favoring weakly connected nodes.The work in [85] reduces broadcasts' message complexity by differentiating its quality of services between good nodes and bad nodes, as defined by the user.Messages are first rapidly broadcast to good nodes using a reactive gossip protocol, while a slower but cheaper periodic push gossip is used to reach bad nodes.The periodic gossip reduces the number of messages but increases the delivery latency for bad nodes.Gravitational gossip [86] offers differential reliability balancing communication workload between them according to their capacities.Nodes receive a fraction r, a user-defined quality rating, of the messages before they time out.Gravitational gossip hence offers a cost/reliability trade-off.Hierarchical gossip [87] greatly reduces message complexity by leveraging the physical network topology.Nodes favor gossip targets that are close in the network hierarchy, resulting in a slight increase in delivery latency since message flooding is avoided.Perigee [88] accelerates dissemination by learning and adapting each node's neighborhood based on their round-trip latencies with other nodes.HEAP [89] reduces video streaming delivery latency by adapting each node's fanout to account for heterogeneous bandwidth capabilities.
In the context of blockchain networks, Marc ¸al et al. [90] reduce the transaction message complexity of Bitcoin miners, by reducing the size of their neighborhood, and their transaction latencies, by orienting new transactions towards miners first.This approach exclusively favors miners without improving clients' experience, unlike GPS as deployed in §6.7.Similarly, gossiping tailored for Hyperledger Fabric [91] also reduces message complexity and overall transaction latencies for miners, but does not consider clients either.Finally, ecBroadcast [17] and EpTO [16] achieve probabilistic total order and can thus be used to implement probabilistic strong consistency conditions at the cost of higher latency, and a higher message complexity for EpTO.

C O N C L U S I O N
Update-query consistency with Primaries and Secondaries (UPS), with its underlying gossip protocol Gossip Primary-Secondary (GPS), provides a novel eventual consistency mechanism offering differentiated data consistency and delivery latency properties.Primary nodes deliver updates faster at the cost of a small consistency penalty, Secondary nodes experience stronger consistency with a slightly higher latency.Both node classes reach a consistent state with high probability once all updates' dissemination is completed.
Our formal analyses and evaluations on a one-millionnode network both highlighted the impact of the density (fraction) of Primary nodes on the trade-off between consistency and latency experienced by all nodes.A low density favors a fast dissemination to Primary nodes, while a high density favors higher consistency for Secondary nodes.We further evaluated GPS in a simulated blockchain network showing that it improves miners' efficiency and clients' security, slightly, when blocks are created at a high frequency.

A C K N O W L E D G M E N T S
The authors wish to thank the reviewers of IEEE TPDS for their valuable feedback and Lucianna Kiffer for generously sharing her datasets [10].This work was partially funded by the French ANR grants O'Browser (ANR-16-CE25-0005-03) and ByBloS (ANR-20-CE25-0002-01).

A P P E N D I X C H A N G E L O G
A preliminary version of this article titled "Speed for the elite, consistency for the masses: differentiating eventual consistency in large-scale distributed system" appears in the proceedings of SRDS 2016 [1].In addition to general improvements, the current article notably adds: • A compartment-based discrete analysis of GPS in §4.2 that utilizes models originally developed in epidemiology, to help us study the consistency of GPS, and complements the asymptotic analysis in §4.1 that studies latency and message complexity; • A comparison between analytical and experimental results on consistency in § 6.4 obtained thanks to the new compartment-based discrete analysis of GPS; • Precise numbers on the reliability and overhead of GPS in our evaluation in §6.6 and Tab.1; • An evaluation of GPS in §6.7 on a simulated blockchain network that closely matches the characteristics of Ethereum's network; • A security discussion in § 7 that presents possible improvements towards securing GPS and UPS.

A P P E N D I X P R O O F S F O R T H E C O M PA R T M E N T-B A S E D D I S C R E T E A N A LY S I S
We here provide the proofs for Thm. 1 and Thm. 2 presented in the compartment-based discrete analysis described in §4.2.
For convenience, we repeat the two theorems and Fig. 5.
The messages received during round r by Primaries are those sent during round r − 1 by the nodes in P r−1 1,0 and P r−1 2+,0 .We therefore have P r P [receive = 0] = (1 − β P ) (P r−1 1,0 +P r−1 where β P = f |P | is the infection rate of Primaries.Let us note ∆ r (P i,j ) the population change of nodes in compartment P i,j during round r, i.e. ∆ r (P i,j ) = P r+1 i,j − P r i,j .
Theorem 2. The evolution of Secondary nodes follows the formula S r+1 0,0 = c P r 0,0 , P r−1 0,0 , P r 1,1 , P r−1 1,1 , S r 0,0 , S r−1 0,0  Proof.The reasoning for S 0,0 follows that of P 0,0 .As for P 0,0 , we will assume for simplicity that a Secondary node might select itself when gossiping a message.As a result, we do not need to distinguish between sender and receiver in terms of probability of receiving a message, or to distinguish between broadcasts from Primaries to Secondaries on one hand, from broadcasts within Secondaries on the other.
In the following we will note β S = f |S| the probability that a Secondary node receives a given broadcast (whether originating from a Primary or Secondary node).
During a round r, messages sent to Secondaries originate from Primaries that received a second message during round r − 1 (making up P r−1 2+,1 + P r−1 2+,0 messages), and from Secondaries that received their first message during round r − 1 (i.e. S r−1 1+,0 messages).These messages are received during round r and determine the size of the Secondary compartments S 0,0 and S 1+,1 at the start of round r + 1. (Recall that S 0,0 + S 1+,1 = |S| when a round starts.) The nodes remaining in the compartment S 0,0 are those that receive none of the P r−1 2+,1 + P r−1 2+,0 + S r−1 1+,0 messages sent during round r − 1:

Fig. 4 :
Fig. 4: Model of GPS and path of an update in the system.

10 :
ΓP ← rpsViewSize nodes from Primary-RPS 11: ΓS ← rpsViewSize nodes from Secondary-RPS 12: procedure B R O A D C A S T(msg) ▷ Called by the application, 13: G O S S I P(msg, ΓP ) ▷ e.g. in Line 8 of Alg. 1 14: procedure G O S S I P(msg, targets) 15:

14 '▷
: procedure G O S S I P(msg) 15': for all j ∈ {fanout random nodes in Γ} do 16': send(msg) to j 17': upon receive (msg) do 18': if msg ∈ R then return ▷ "Infect and die" Deliver 1 st receipt 21': G O S S I P(msg) 10 and 11) to track each node class.Both Primaries and Secondaries use the RPS view of their class (i.e. a small random sample of nodes from this class) to retransmit a message they receive for the first time to fanout other nodes in their own class (Lines 22 and 23), thus implementing Phases 2 and 4. • Second, nodes in GPS handle retransmissions differently depending on their class.Primaries use the inherent presence of message duplicates in gossip protocols to decide locally when to switch from Phase 2 to 3. Each node keeps count of the received copies of individual messages (Lines 8, 18 and 19

Fig. 5 :
Fig. 5: The compartments used in our discrete analysis.
Dissemination to Primaries.(b) Dissemination to Secondaries.
Figs. 10 and 13 depict mean values as symbols and minimum and maximum values as error bars.Low variability may render error bars unnoticeable.Figs. 14 and 15 depict mean values as symbols and 95% confidence intervals, instead of extrema, as error bars since they rely on realistic datasets. 0

Fig. 13
Fig. 13 compares the dissemination latency of updates using Uniform gossip and UPS with different densities.This figure shows Primaries are infected faster with UPS than with Uniform gossip.Primaries receive all updates 1, 2 and 3 rounds

Fig. 13 :
Fig.13: Dissemination latency of all 10 updates to Primaries and Secondaries using UPS for densities d ∈ {000.1, 00.1, 0.1} and a baseline using Uniform gossip (closer to the top left is better).Compared to the baseline, Primaries receive all updates 1, 2 and 3 rounds earlier for densities of 10 −1 , 10 −2 and 10 −3 , respectively, while Secondaries receive all updates only half a round later on average.

Fig. 14 :
Fig.14: Ratio of inconsistent nodes during dissemination for Uniform gossip and GPS when block creation is accelerated 200, 100 and 10 times.Creating blocks faster leads to more inconsistencies (Fig.14a) while creating them slowly removes the expected benefit for Secondaries (clients) (Fig.14c).

Fig. 15 :
Fig.15: Dissemination latency of 10 blocks for Uniform gossip and GPS when block creation is accelerated 200, 100 and 10 times.Primaries (miners) receive blocks ≈160 ms faster than Secondaries (clients).This gain is hardly noticeable when updates are created slowly (Fig.15c).

Davide
Frey has been a researcher at Inria Rennes Bretagne-Atlantique since 2010.He received his PhD from Politecnico di Milano in Italy in 2006; he then worked as a post-doctoral researcher both at Washington University in St. Louis (MO), and at Inria Rennes before being recruited as a permanent researcher in 2010.His research interests focus on the systemic aspects of large-scale distributed systems.Achour Mostefaoui received the MSc degree is computer science in 1991, and the PhD degree from the University of Rennes in 1994.He is a professor of computer science at the University of Nantes, France.He is the head of a master's diploma in computer science, University of Nantes, and is a co-head of the GDD research team within the LINA Lab.Matthieu Perrin is an associate professor at the University of Nantes.His scientific interests focus on the wide area of distributed computing, and in particular algorithms in shared memory and message-passing distributed systems, including modeling of weakly consistent shared objects and message broadcast primitives.Pierre-Louis Roman is a postdoctoral researcher in computer science at the École Polytechnique F éd érale de Lausanne (EPFL), Switzerland.He holds a PhD from the University of Rennes 1 obtained in 2018 for his work on decentralized systems.His research interests revolve around distributed systems with a particular focus on secured and scalable systems such as distributed ledgers and cryptocurrencies.Franc ¸ois Taïani is a Professor in Distributed Computer Systems at the University of Rennes 1, and at IRISA/Inria in Rennes, France, where he heads the WIDE Inria research team.His main research interest lies in the scalability and programmability of complex distributed systems (e.g.overlays, on-line social networks, data centers), with a focus on resilience, concurrency, and selforganization.

Fig. 5 :
Fig. 5: The compartments used in our discrete analysis

TABLE 1 :
Mean number of messages per run of Uniform gossip and GPS alongside reliability and overhead vs Uniform gossip for 10 updates sent to 1 M nodes with a fanout of 10.