Verifying safety of synchronous fault-tolerant algorithms by bounded model checking

Threshold automata are a formalism introduced for modeling, verification, and synthesis of fault-tolerant distributed algorithms for asynchronous systems, that is, in interleaving semantics. Owing to well-known limitations of what can be achieved in purely asynchronous systems, many fault-tolerant distributed algorithms are designed for synchronous or round-based semantics. In this paper, we introduce the synchronous variant of threshold automata and study their applicability and limitations for the verification of synchronous fault-tolerant distributed algorithms. We show that the parameterized reachability problem for synchronous threshold automata is undecidable. Still, we show that many synchronous fault-tolerant distributed algorithms have a bounded diameter, even though the algorithms are parameterized by the number of processes. Hence, bounded model checking can be used for verifying these algorithms. The existence of bounded diameters is the main conceptual insight in this paper. We compute the diameter of several algorithms and check their safety properties, using SMT queries that contain quantifiers for dealing with the parameters symbolically. Surprisingly, performance of the SMT solvers on these queries is very good, reflecting the recent progress in dealing with quantified queries. We found that the diameter bounds of synchronous algorithms in the literature are tiny (from 1 to 8), which makes our approach applicable in practice. For a specific class of algorithms, we also establish a theoretical result on the existence of a diameter, providing a first explanation for our experimental results.


Introduction
Fault-tolerant distributed systems and algorithms are hard to design and verify.Their verification has recently been addressed with a wide range of techniques.Mechanized verification techniques like IronFleet [2] and Verdi [3] were used to verify implementations of asynchronous distributed algorithms.Later, Disel [4] introduced a logic to make the reasoning less protocol-specific.Deductive verification techniques like natural proofs [5], Ivy [6,7], and PSync [8,9], try to increase the degree of automation and thus limit the required user guidance.Model checking-based techniques [10][11][12][13][14] are fully automated, but more restricted regarding the protocols they are applicable to.Most of these methods focus on asynchronous distributed algorithms.In this paper we focus on synchronous (round-based) distributed algorithms.Partly, because they find applications in distributed real-time systems [15] and partly because it has been observed that reductions [16][17][18][19] allow us to verify an interesting class of asynchronous algorithms under synchronous semantics.
Fig. 1 Pseudocode of synchronous reliable broadcast à la [23], and its STA, with locations L, initial locations I, parameters , rules R, guards φ 1 , . . ., φ 4 , resilience condition RC, and counter invariant χ 34], and benchmarks that tolerate send omissions from [29,34]. 5. We are the first to automatically verify the Byzantine and send omission benchmarks.For the crash benchmarks, our method performs significantly better than the abstractionbased method in [35].By tweaking the constraints on the parameters n, t, f , we introduce configurations with more faults than expected, for which our technique automatically finds a counterexample.
The remainder of the paper is organized as follows.In Sect.2, we give a high-level overview on the problem we are considering and the proposed solutions.Section 3 introduces the synchronous variant of threshold automata and counter systems.We prove undecidability of parameterized reachability for synchronous threshold automata in Sect. 4. In Sect.5, we show the way in which we can compute the diameter for synchronous threshold automata, and in Sect.6, we show how to encode safety properties as bounded reachability queries.We present our experimental results in Sect.7 and discuss related and future work in Sect.8.

Overview of our approach
Bounded Diameter.Consider the pseudocode of the reliable broadcast algorithm and its corresponding STA in Fig. 1.The processes execute the send, receive, and local computation steps in lock-step.One iteration of the loop is expressed as an STA edge that connects the locations before and after an iteration (i.e., the STA models the loop body of the pseudocode).The location se encodes that v is 1 and accept is false.That is, se is the location in which processes send < ECHO > in every round.If a process sets accept to true, it goes to location ac.The location where v is 1 is encoded by v1, and the location where v is 0 by v0.
An example execution is depicted in Table 1 on the left.We run n − f copies of the STA in Fig. 1.Observe that the guards of the rules r 1 and r 2 are both enabled in the configuration σ 0 , since by the resilience condition we have t ≥ f and since in σ 0 we have 1 < t + 1 process in v1.One process uses r 2 to move from v0 to se while the others use the self-loop r 1 to stay in v0.(The process in v1 uses r 3 to move to se.)As both rules r 1 and r 2 remain enabled, in every round one more process can go to se.Hence, the configuration σ t+1 has t + 1 correct processes in location se and the rule r 1 becomes disabled.Then, all remaining processes go to se and then finally to ac.This execution depends on the parameter t, which implies that the length of this execution is unbounded for increasing values of the parameter t. (We note that we can obtain longer executions, if some processes use the rule r 4 ).On the right, we see an execution, starting in the configuration σ 0 , where all processes take a step using r 2 immediately.That is, while the configuration σ t+3 is reached by a long execution on the left, it is reached in just two steps on the right (observe that σ 2 = σ t+3 ).We are interested in whether there is a natural number k (which does not depend on the parameters n, t and f ) such that we can always shorten executions to executions of length ≤ k that have the same initial and final configurations.(By length, we mean the number of transitions in an execution.)In such a case, we say that the STA has bounded diameter.In Sect.5.1 we introduce an SMT-based procedure that enumerates candidates for the diameter bound and checks if the candidate is indeed the diameter; if it finds such a bound, it terminates.For the STA in Fig. 1, this procedure computes the diameter 2. Threshold Automata with Traps.In Sect.5.2, we define a fragment of STA for which we theoretically guarantee a bounded diameter.For example, the STA in Fig. 1 falls in this fragment, and we obtain a guaranteed diameter of at most 8.The fragment is defined by two conditions: (i) The STA has a structure that implies monotonicity of the guards: the set of locations that are used in the guards (e.g., {v1, se, ac}) is closed under the rules, i.e., from each location within the set, the STA can reach only a location in the set.We call guards that have this property trapped.(ii) The STA has no cycles, except possibly self-loops.Bounded Model Checking, Completeness and (Un-)Decidability.The existence of a bounded diameter motivates the use of bounded model checking for verifying safety properties.In Sect.6, we give an SMT encoding for checking the violation of a safety property by executions with length up to the diameter.Crucially, this approach is complete because if an execution reaches a bad configuration, this bad configu- . . .
Fig. 2 Pseudocode of FloodMin from [33], and STA encoding its loop body, for k = 1, with locations L, initial locations I, parameters , rules R, guards φ 1 , φ 2 , resilience condition RC, and counter invariant χ ration is already reached by an execution of bounded length.We observe that for the STA defined in this paper (with linear guards and linear constraints on the parameters), the SMT encoding results in a Presburger arithmetic formula (with one quantifier alternation).Hence, checking safety properties (that can be expressed in Presburger arithmetic) is decidable for STA with bounded diameter.In Sect.7, we also experimentally demonstrate that current SMT solvers can handle these quantified formulae well.On the contrary, we show in Sect. 4 that the parameterized reachability problem is undecidable for general STA.This implies that there are STA with unbounded diameter.Threshold Automata with Untrapped Guards.The FloodMin algorithm in Fig. 2 solves the k-set agreement problem.This algorithm is ran by n replicated processes, up to t of which may fail by crashing.In k-set agreement, the goal of the processes is to decide on a value, such that not more than k distinct values are decided on by the n processes.For simplicity of presentation, we consider the case when k = 1, which turns k-set agreement into consensus.In Fig. 2 we have the STA that captures the loop body.The locations c0 and c1 correspond to the case when a process is crashing in the current round and may manage to send the value 0 and 1, respectively; the process remains in the crashed location "✖" and does not send any messages starting with the next round.We observe that the guard #{v0, c0} > 0 is not trapped, and our result about trapped guards does not apply.Nevertheless, our SMT-based procedure can find a diameter of 2. In the same way, we automatically found a bound on the diameter for several benchmarks from the literature.It is remarkable that the diameter for the transition relation of the loop body (without the loop condition) is bounded by a constant, independent of the parameters.

Bounded Model Checking of Distributed Algorithms with
Clean Rounds.The number of loop iterations t/k + 1 of the FloodMin algorithm has been designed such that it ensures (together with the environment assumption of at most t crashes) that there is at least one clean round in which at most k − 1 processes crashed.The correctness of the Flood-Min algorithm relies on the occurrence of such a clean round.We make use of the existence of clean rounds by employing the following two-step methodology for the verification of safety properties: (i) we find all reachable clean-round configurations and (ii) check if a bad configuration is reachable from those configurations.Detailed description of this methodology can be found in Sect.6.Our method requires the encoding of a clean round as input (e.g., for Fig. 2 that no STA is in c0 and c1).We leave detecting and encoding clean rounds automatically from the fault environment for future work.

Synchronous threshold automata
We introduce the syntax of synchronous threshold automata and give some intuition of the semantics, which we will formalize as counter systems below.
A synchronous threshold automaton (STA) is the tuple STA = (L, I, , R, RC, χ), where L is a finite set of locations, I ⊆ L is a non-empty set of initial locations, is a finite set of parameters, R is a finite set of rules, RC is a resilience condition, and χ is a counter invariant, defined in the following.We assume that the set of parameters contains at least the parameter n, denoting the number of processes.The resilience condition RC is a linear arithmetic expression over the parameters from the set .We call the vector π = π 1 , . . ., π | | the parameter vector, where π i ∈ is a parameter, for 1 ≤ i ≤ | |.The vector p = p 1 , . . ., p | | ∈ N | | is a vector of natural numbers called an instance of π , and we denote by p[π i ] = p i the value assigned to the parameter π i in the instance p.The set of admissible instances of π is defined as p is an instance of π and p satisfies RC}.The mapping N : P RC → N maps an admissible instance p ∈ P RC to the number N (p) of processes that participate in the algorithm, i.e., the number of processes whose behavior is modeled using the STA.
For example, for the STA in Fig. 1, we have RC ≡ n > 3t ∧ t ≥ f , as the algorithm that this STA models tolerates Byzantine faults and requires more than a third of the processes to be correct.A vector p ∈ N | | is an admissible instance of the parameter vector π = n, t, f , if

hence the admissible instances satisfy p[n] > p[t] ∧ p[t] ≥ p[ f ], and we have N (p) = p[n].
We introduce counter atoms of the form ψ ≡ #L ≥ a • π + b, where L ⊆ L is a set of locations, #L denotes the total number of processes currently in the locations ∈ L, a ∈ Z | | is a vector of coefficients, π is the parameter vector, and b ∈ Z.We will use the counter atoms for expressing guards and predicates in the verification problem.In the following, we will use two abbreviations: #L = a •π +b for the formula A rule r ∈ R is the tuple (from, to, ϕ), where from ∈ L and to ∈ L are locations, and ϕ is a guard whose truth value determines if the rule r is executed.The guard ϕ is a Boolean combination of counter atoms.We denote by the set of counter atoms occurring in the guards of the rules r ∈ R.
The counter invariant χ is a Boolean combination of counter atoms #L ≥ a • π + b, where each atom occurring in χ restricts the number of processes allowed to populate the locations in L ⊆ L. Counter Systems.The counter atoms are evaluated over tuples (κ, p), where κ ∈ N |L| is a vector of counters, and p ∈ P RC is an admissible instance of π .For a location ∈ L, the counter κ[ ] denotes the number of processes that are currently in the location .A counter atom The semantics of the Boolean connectives is standard.
A transition is a function t : R → N that maps a rule r ∈ R to a factor t(r ) ∈ N, denoting the number of processes that apply this rule.Given an instance p of π, we denote by T (p) the set {t | r ∈R t(r ) = N (p)} of transitions whose rule factors sum up to the number N (p) of participating processes.
The first condition ensures that processes only use rules whose guards are satisfied, and the second that every process moves in an enabled transition.
Observe that each transition t ∈ T (p) defines a unique tuple (κ, p) in which it is enabled.We call the origin of a transition t ∈ T (p) the tuple o(t) = (κ, p), such that for every location ∈ L, we have o(t).κ[] = r ∈R∧r .from=t(r ).Similarly, each transition defines a unique tuple (κ, p) that is the result of applying the transition in its origin.We call the goal of a transition t ∈ T (p) the tuple g(t) = (κ, p), such that for every location ∈ L, we have g(t).κ[] = r ∈R∧r .to=t(r ).
We now define a counter system, for a given admissible instance p ∈ P RC of the parameter vector π and an STA = (L, I, , R, RC, χ).
We restrict ourselves to deadlock-free counter systems, i.e., counter systems where the transition relation is total (every configuration has a successor).A sufficient condition for deadlock freedom is that for every location ∈ L, it holds that χ → r ∈R∧r .from=r .ϕ.This ensures that it is always possible to move out of every location, as there is at least one outgoing rule per location whose guard is satisfied.
To simplify the notation, in the following we write σ [ ] to denote σ.κ[ ].Paths and Schedules in a Counter System.We now define paths and schedules of a counter system, as sequences of configurations and transitions, respectively.Definition 2 A path in the counter system CS(STA, p) = ( (p), I (p), R(p)) is a finite sequence {σ i } k i=0 of configurations, such that for every two consecutive configurations σ i−1 and σ i , for 0 < i ≤ k, there exists a transition t i ∈ T (p) We call σ 0 the origin, and σ k the goal of τ , and write σ 0 The following proposition states a property of feasible schedules, namely that in every transition of a feasible schedule all processes move, and is a consequence of Definition 3.

reachability and its undecidability
We define the parameterized reachability problem for synchronous threshold automata.
Definition 4 (Parameterized Reachability) Given a formula ϕ, that is, a Boolean combination of counter atoms, and STA = (L, I, , R, RC, χ), the parameterized reachability problem is to decide whether there exists an admissible instance p ∈ P RC , such that in the counter system CS(STA, p), there is an initial configuration σ ∈ I (p), and a feasible schedule τ , with σ τ − → σ and σ | ϕ.
We show that this problem is undecidable in general, by reduction from the halting problem of a two-counter machine (2CM) [36].Such reductions are common in parameterized verification, e.g., see [37].
A two-counter machine (2CM) M consists of two registers A and B, and a set I = {inc i , dec i , j z i (k), halt | i ∈ {A, B} and k ∈ N} of instructions, consisting of increment inc i , decrement dec i , and jump-if-zero j z i (k) instructions for each register i ∈ {A, B}, together with a halting instruction halt for the machine.A sequence P = inst 1 , . . ., inst m , where inst i ∈ I, for 1 ≤ i ≤ m and m ∈ N, is called the program of M. The machine M starts the execution of the program P in the initial instruction inst 1 and proceeds as follows.If the instruction inst j , for 1 ≤ j ≤ m is an inc i (resp.dec i ) instruction, it increments (resp.decrements) the register i, for i ∈ {A, B} and moves the control of the program to location j + 1.If inst j is a j z i (k) instruction, it moves the control of the program to the location k in case the register i contains the value 0, for i ∈ {A, B}, and to the location j + 1 otherwise.We require that inst m = halt is the halting instruction of P. We assume that initially the registers A, B contain the value 0.
The halting problem of 2CM is known to be undecidable [36].To prove undecidability of the parameterized reachability problem, we construct an automaton , such that every counter system CS(STA M , p M ) induced by it simulates the steps that a 2CM M takes when executing the program P. The constructed STA M has a single parameter-the number n of processes.The idea is that each of the n processes plays one of two roles in the counter system.That is, each process is either used to encode the control flow of the program P, or it is used to encode the values of the registers in unary [38].The former are called the controller processes and the latter the storage processes.Thus, STA M consists of two parts-one per each role.
We now proceed by defining the structure of the automaton STA M .The set L M = L C ∪L S of locations is partitioned into locations L C and L S of the controller and of the storage, respectively.The set L C of controller locations consists of a location j for each instruction inst j , where 1 ≤ j ≤ m, an additional location * j if inst j is an increment or decrement instruction, and a special location stuck that denotes a stuck configuration of the 2CM.The set L S of storage locations contains the location store , one location for each of the registers A , B , and one location per increment/decrement instruction inc i , dec i , for i ∈ {A, B}.Intuitively, the state store is used to store processes that will eventually make transitions to one of i , for i ∈ {A, B}, via inc i , and those that make transitions from i via dec i .The set I M ⊆ L M of initial locations contains the locations 1 ∈ L C and store ∈ L S .
The set M of parameters contains the single parameter n, denoting the number of processes.As there are no other parameters, the resilience condition RC M holds true for every instance p M , i.e., every p M [n] ∈ N is admissible.
The set R M = R C ∪R S of rules consists of rules R C and R S for the controller and storage processes, respectively.An increment (resp.decrement) of register A is modeled by moving a single storage process to (resp.from) the location A from (resp.to) the location store , and moving the controller processes to the location corresponding to the next instruction in two steps.That is, the controller processes move from j to j+1 via the additional location * j , if inst j is an increment (resp.decrement) instruction.The jump-if-zero instruction of register A is modeled by moving the controller process to the location k if there are no processes in the location A , and to the location corresponding to the next instruction otherwise.The increment, decrement, and jump-if-zero for register B are modeled in an analogous way.
We now formally define the rules in R M = R C ∪ R S .Let inst j , for 1 ≤ j ≤ m, be an instruction of the program P, and i ∈ {A, B}.For convenience, we use the notation " → if ϕ" for the rule ( , , ϕ).
The set R C of rules of the controller contains: Depending on inst j , we consider the following cases.Case 1.If inst j is an inc i instruction, then R C contains the rules (depicted in Fig. 3): Case 2. If inst j is a dec i instruction, then R C contains the rules: Fig. 3 The synchronous threshold automaton STA M , with the controller part on the left, and the storage part on the right.The rules whose guards are depicted in the figure are used to encode increment of register The set R S of rules of the storage contains: Again, based on inst j , we consider the cases: Case 1.If inst j is an inc i instruction, then R S contains the rules: Note that, in case inst j is a j z i (k) instruction, we do not need to introduce new rules in R S , as the 2CM does not modify the value of the registers when performing a j z i (k) instruction.
This construction allows multiple processes to act as controllers, and since we assume that the 2CM is deterministic, all the controllers behave the same.
To truly model an increment (resp.decrement) of register i, for i ∈ {A, B}, the controller processes have to ensure that exactly one process was moved to (resp.from) the location i via the location inc i (resp.dec i ).In other words, for every inc i (resp.dec i ) instruction in the program P, for i ∈ {A, B}, if the guard of rule (1) (resp.( 2)) is satisfied, the controllers move to the location corresponding to the next instruction, and the number of storage processes in the location i is increased (resp.decreased) by exactly one.Otherwise, all controller processes are moved to the stuck location, and the number of processes in i no longer correspond to the value of register i.
Consider Fig. 3, which depicts the locations and rules that encode the increment of register A. The controllers that are in location * j move to the location j+1 if the guard #{ inc A } = 1 is satisfied.If #{ inc A } = 1, the controllers move from * j to the stuck location stuck .
Similarly, for every j z i (k) instruction, all the controllers move to the location k if the guard of rule ( 3) is satisfied, that is, if the number of storage processes in the location i is 0, which corresponds to the value of register i being equal to 0. Otherwise, the controllers move to the location corresponding to the next instruction in the program P.
The main invariant which ensures correctness of the construction is that every transition in a counter system induced by STA M either faithfully simulates a step of the 2CM, or moves all of the controller processes to the stuck location.Furthermore, if there are no controller processes in the stuck location, the number of processes in locations A , B denote the current values of the registers A, B, respectively.
The formula ϕ M ≡ #{ m } = 0 states that the controller processes reach the location m ∈ L C , which corresponds to the halting instruction inst m = halt of the program P.
We now formally state the reduction.
Theorem 1 Given STA M and ϕ M , the answer to the parameterized reachability question is positive iff the 2CM M halts.
If M halts, by the above construction we get that for some p M [n] ∈ N, there is a configuration σ in the counter system CS(STA M , p M ), reachable from the initial configuration σ , such that ϕ holds in σ , i.e., σ | ϕ M .
In the other direction, if M does not halt, then we have that for every p M [n] ∈ N, and every configuration σ in the counter system CS(STA M , p M ), reachable from the initial configuration σ , it holds that σ | ϕ M .This gives us undecidability of parameterized reachability.

Bounded diameter oracle
Given an STA, the diameter is the maximal number of transitions needed to reach all possible configurations in every counter system induced by the STA and an admissible instance p ∈ P RC .We adapt the definition of diameter from [24]. 123 Definition 5 (Diameter) Given an STA, the diameter is the smallest number d such that for every p ∈ P RC and every path {σ i } d+1 i=0 of length d+1 in the counter system CS(STA, p), induced by STA and p, there exists a path {σ j } e j=0 of length e ≤ d in CS(STA, p), where σ 0 = σ 0 and σ d+1 = σ e .

Computing the diameter using SMT
In this section, we introduce an SMT-based semi-decision procedure for determining the diameter of an STA.
By the above definition, the diameter is the smallest number d that satisfies the formula: where and R(σ, t, σ ) is a predicate which evaluates to true if we have that σ t − → σ .Since we assume deadlock freedom, we are able to encode the path Path(σ 0 , σ d , d) of length d, even if the disjunction d i=0 σ i = σ d+1 holds for some i ≤ d.Formula (4) gives us the following procedure to determine the diameter: 1. initialize the candidate diameter d to 1; 2. check if the negation of the formula (4) is unsatisfiable; 3. if yes, then output d and terminate; 4. if not, then increment d and jump to step 2.
If the procedure terminates, it outputs the diameter, which can be used as a completeness threshold for bounded model checking.We implemented this procedure and used a backend SMT solver to automate the test in step 2. In Sect.7 we report that for all our benchmarks, our implementation of this procedure terminates and outputs small values for the diameter (from 1 to 8).
Our tool generates an SMT-LIB file that encodes the negation of formula (4) as follows.
First, observe that the negation of the formula (4) has leading existential quantifiers followed by universal quantifiers.In the SMT-LIB encoding, we use Skolemization and replace the existentially quantified variables by constants.Thus, we declare integer constants n, t, and f, corresponding to the parameters, and add an assertion that encodes the resilience condition, e.g., n > t ∧ t ≥ f.This ensures that the values assigned to the parameters are admissible.
Second, we encode the formula Path(σ 0 , σ d+1 , d + 1).We declare integer constants c_i_j that correspond to the value of the counter ˇ[ j ] in configuration σ i , for 0 ≤ i ≤ d + 1, and 0 ≤ j < |L|, and integer constants t_i_k corresponding to the factor of rule r k in transition t i , for 0 < i ≤ d + 1, and 0 ≤ k < |R|.We add assertions that every c_i_j and t_i_k is greater or equal to 0, and additional assertions to ensure that the constants c_i_j satisfy the counter invariant χ .Then, we add assertions that model the predicate R(σ i , t i+1 , σ i+1 ), for 0 ≤ i < d + 1: -for every rule r k ∈ R, we assert that its factor t i+1 (r k ) is 0, if its guard is not satisfied in the configuration σ i , -we assert that σ i is the origin and σ i+1 is the goal of t i+1 .
Finally, we encode the diameter query, that is, a universally quantified formula over integer variables x_i_j, modeling the value of the counters κ[ j ] in σ i , for 0 ≤ i ≤ d and 0 ≤ j < |L|, and y_i_k, modeling the factors t i (r k ), for 0 < i ≤ d and 0 ≤ k < |R|.The body of the diameter query is an implication P → Q, where P is the conjunction of

Bounded diameter for a fragment of STA
In this section, we show that for a specific fragment of STA, we are able to give a theoretical bound on the diameter, similar to the asynchronous case [20,39].
The STA that falls in this fragment is monotonic and 1cyclic.An STA is monotonic iff every counter atom changes its truth value at most once in every path of a counter system induced by the STA and an admissible instance p ∈ P RC .This implies that every schedule can be partitioned into finitely many sub-schedules that satisfy a property we call steadiness.We call a schedule steady if the set of rules whose guards are satisfied does not change in all of its transitions.We also give a sufficient condition for monotonicity, using trapped counter atoms, defined below.In a 1-cyclic STA, the only cycles that can be formed by its rules are self-loops.Under these two conditions, we guarantee that for every steady schedule, there exists a steady schedule of bounded length that has the same origin and goal.We show that this bound depends on the counter atoms occurring in the guards of the STA, and the length of the longest path in the STA, denoted by c.The main result of this section is stated by the theorem: Theorem 2 For every feasible schedule τ in a counter system CS(STA, p), where STA is monotonic and 1-cyclic, and p ∈ P RC , there exists a feasible schedule τ of length O(| |c), such that τ and τ have the same origin and goal.
To prove Theorem 2, we start by defining monotonic STA.Definition 6 A synchronous threshold automaton STA = (L, I, , R, RC, χ) is monotonic iff for every path {σ i } k i=0 in the counter system CS(STA, p), induced by STA and p ∈ P RC , and every counter atom ψ ∈ , we have To show that we can partition a schedule into finitely many sub-schedules, we need the notion of a context.A context of a transition t ∈ T (p) is the set

Lemma 1 Every feasible schedule τ in a counter system induced by a monotonic STA has at most | | context switches.
Proof Let τ = {t i } k i=1 be a feasible schedule and the set of counter atoms appearing on the rules of the monotonic STA.For every ψ ∈ , there is at most one context switch i, for 0 Sufficient Condition for Monotonicity.We now introduce the notion of trapped counter atoms.
Definition 7 A set L ⊆ L of locations is called a trap, iff for every ∈ L and every r ∈ R such that = r .from, it holds that r .to∈ L.
The following lemma states that, once a trapped counter atom holds in a configuration, it also holds in its immediate successor.Corollary 1 Let STA = (L, I, , R, RC, χ) be an automaton such that all its counter atoms are trapped.Then STA is monotonic.
Steady Schedules.We define the notion of steadiness, similarly to [39].
The following proposition states a property of steady schedules in counter systems induced by monotonic STA and is a consequence of the monotonicity and Definition 8.It states that if the contexts of the first and the last transition of a schedule in a counter system induced by a monotonic STA are the same, then the schedule is steady.
We now focus on shortening steady schedules.That is, given a steady schedule, we construct a schedule of bounded length with the same origin and goal.
Observe that STA = (L, I, , R, RC, χ) can be seen as a directed graph G STA , with vertices corresponding to the locations ∈ L, and edges corresponding to the rules r ∈ R. We denote by c the length of the longest path between two nodes in the graph G STA and call it the longest chain of STA.If G STA contains only cycles of length one, then STA is called 1-cyclic.
To shorten steady schedules, in addition to monotonicity, we require that the STA is also 1-cyclic.In the following, we assume that the schedules we shorten come from counter systems induced by monotonic and 1-cyclic STA.Intuitively, if a given schedule is longer than the longest chain of the STA, then in some transition of the schedule some processes followed a rule which is a self-loop.As processes may follow self-loops at different transitions, we cannot shorten the given schedule by eliminating transitions as a whole.Instead, we deconstruct the original schedule into sequences of process steps, which we call local runs, shorten the local runs, and reconstruct a new shorter schedule from the shortened local runs.The main challenge is to show that the newly obtained schedule is feasible and steady.Schedules as Multisets of Local Runs.We proceed by defining local runs and showing that each schedule can be represented by a multiset of local runs.
We call a local run the sequence = {r i } k i=1 of rules, for r i ∈ R, such that r i .to= r i+1 .from,for 0 < i < k.We denote by [i] = r i the i-th rule in the local run , and by | | the length of the local run.The following lemma shows that a feasible schedule can be deconstructed into a multiset of local runs.

Lemma 3
For every feasible schedule τ = {t i } k i=1 , there exists a multiset (P, m), where 1. P is a set of local runs of length k, and 2. m : P → N is a multiplicity function, such that for every location ∈ L, it holds that r .from=t i (r ) = Proof We proceed by induction on the length of the schedule.
In the induction base, let τ be a schedule of length one, that is, τ consists of a single transition t 1 .We define the multiset (P, m) by setting P = {r ∈ R | t 1 (r ) = 0} to be the set of rules that have nonzero factors in t 1 , and for every r ∈ P, we set m(r ) = t 1 (r ).
In the induction step, consider a schedule τ = {t i } k+1 i=1 of length k + 1.For the prefix τ = {t i } k i=1 of τ , which is a feasible schedule of length k, we have, by the induction hypothesis, that there exists a multiset (P , m ) such that P is a set of local runs of length k, and for every location ∈ L it holds that r .from=t i (r ) = [i].from= m ( ), for 0 < i ≤ k and ∈ P .Let σ k = g(t k ) be the goal configuration of t k .For every ∈ L, it holds that σ k [ ] = r .to=t k (r ).Observe that σ k [ ] also represents the number of local runs that end in a rule that points to the location ∈ L, that is, Given the transition t k+1 , let R k+1 = {r ∈ R | t k+1 (r ) = 0} be the set of rules that have nonzero factors in t k+1 .We define the set and [k].to = r .from} of local runs of length k + 1, where r is the local run obtained by appending r to .We define the function m that maps local runs = r from the set P such that for every ∈ L, it holds that [k].to= m ( ) = r .from=m( r ).We now check that the multiset (P, m) satisfies the two properties.Clearly, the set P contains local runs of length k + 1.To show the second property, we use the assumption that τ is a feasible schedule.Hence, for every ∈ L, it holds that r .from= by Proposition 1,(6), and the construction, respectively.
We can also easily translate back from a multiset (P, m) of local runs of length k to a schedule τ = {t i } k i=1 of length k.
That is, for every rule r ∈ R and 0 < i ≤ k, we can define t i (r ) = [i]=r m( ) and obtain the schedule τ of length k.The next lemma states that for every feasible schedule in a counter system induced by a 1-cyclic STA, and whose length is longer than the longest chain c, every local run contains a self-loop.
This implies that the STA contains a cycle which is not a self-loop, which is a contradiction to it being 1-cyclic.
For the counter systems of STA, which are both monotonic and 1-cyclic, we show that their steady schedules can be shortened, so that their length does not exceed the longest chain c (that is, the length of the longest path in the STA).
which is also a steady and feasible schedule.
By Lemma 3, for θ , there exists a multiset (P, m) of local runs of length k that describes it.Since k > c, by Lemma 4, every local run ∈ P contains a rule r which is a self-loop, that is, r .from= r .to.
Construct a set P of local runs of length k − 1, such that every ∈ P is obtained from some ∈ P by removing one occurrence of a self-loop rule.Given a local run ∈ P of length k − 1, denote by P( ) the set of local runs ∈ P of length k such that was obtained by removing exactly one occurrence of a self-loop rule in .For every i, where 0 < i < k, and ∈ P( ), it holds that either otherwise.Construct a multiplicity function m : P → N, such that for every ∈ P , we have m ( ) = ∈P( ) m( ).The multiset (P , m ) defines a schedule θ = {t i } k−1 i=1 of length k − 1.We now show that θ is feasible, and that it has the same origin and goal as θ .
To show that θ is feasible, we show that for every i, for 1 < i < k, and every ∈ L, we have r .to=t i−1 (r ) = r .from=t i (r ).By the definition of P( ), we can associate to every local run ∈ P( ) an index j , which denotes the position of the self-loop rule that was removed from in order to obtain .Thus, which implies r .from=t i (r ) = r .to=t i−1 (r ).This and Proposition 1 give us the feasibility of θ .
To show that θ and θ have the same origin, we will show that for every ∈ L, we have o(t 1 )[ ] = o(t 1 )[ ].Let ∈ L be a location.Without loss of generality, suppose that there is one local run * ∈ P whose first rule originates in and is a self-loop, that is, * [1].from = and * [1].from = * [1].to, such that the self-loop rule * [1] was removed in order to obtain .Thus, we have that [1].from = * [2].from.As all the other local runs remain unchanged in the first rule, we have that To show that θ and θ have the same goal, we follow similar reasoning.
Observe that the goal of θ is the origin of the transition t k+1 in the schedule τ .Since θ and θ have the same goal configurations, we can append t k = t k+1 to the schedule θ and obtain a new schedule τ = {t i } k i=0 .The following holds for the schedule τ : -it is feasible, since θ is feasible and hence the contexts of t 1 and t k are equal, since τ is steady and the contexts of t 1 and t k+1 are equal.The steadiness of τ follows from this and Proposition 2; and τ have the same origin, as o(t 1 ) = o(t 1 ); -τ and τ have the same goal, as g(t k+1 ) = g(t k ).
As a consequence of Lemma 1 and 5 , we obtain Theorem 2, which tells us that for any feasible schedule, there exists a feasible schedule of length O(| |c).This bound does not depend on the parameters, but on the number of context switches and the longest chain c, which are properties of the STA.

Bounded model checking of safety properties
Once we obtain the diameter bound d (either using the procedure from Sect.5.1 or by Theorem 2), we use it as a completeness threshold for bounded model checking.For the algorithms that we verify, we express the violations of their safety properties as reachability queries on bounded executions.The length of the bounded executions depends on d and on whether the algorithm was designed such that it is assumed that there is a clean round in every execution.
Checking Safety for Algorithms that do not Assume a Clean Round.Here, we search for violations of safety properties in executions of length e ≤ d, by checking satisfiability of the formula: ∃p ∈ P RC .∃σ 0 , . . ., σ e .∃t 1 , . . ., t e .
where the predicate Init(σ ) encodes that σ is an initial configuration, together with the constraints imposed on the initial configuration by the safety property, the formula Path(σ 0 , σ e , e) is defined as in (5), and Bad(σ ) encodes the bad configuration, which, if reachable, violates safety.For example, the algorithm in Fig. 1 has to satisfy the safety property unforgeability (given in Table 2): If no process sets v to 1 initially, then no process ever sets accept to true.In our encoding, we check executions of length e ≤ d, whose initial configuration has the value zero for the counter κ[v1], that is, κ[v1] = 0.In a bad configuration, the value of the counter κ[ac] is different than zero, i.e., κ[ac] = 0. Thus, to find violations of unforgeability, in formula (7), we set: Checking Safety for Algorithms with a Clean Round.We check for violations of safety in executions of length e ≤ 2d, where e = e 1 + e 2 such that: (i) we find all reachable cleanround configurations in an execution of length e 1 , for e 1 ≤ d, such that the last configuration σ e 1 satisfies the clean round condition, and (ii) we check if a bad configuration is reachable from σ e 1 by a path of length e 2 ≤ d.That is, we check satisfiability of the formula: ∃p ∈ P RC .∃σ 0 , . . ., σ e .∃t 1 , . . ., t e .
For example, one of the safety properties that the Flood-Min algorithm for k = 1 (Fig. 2) has to satisfy, is Agreement (Table 2), which requires that at most k different values are decided.In the original algorithm, the processes decide after t/k + 1 rounds, such that at least one of them is the clean round, in which at most k − 1 processes crash.In our encoding, we check paths of length e ≤ 2d.We enforce the clean round condition by asserting that the sum of counters of the locations c0, c1 are k − 1 = 0 in the configuration σ e 1 .The property 1-agreement is violated if in the last configuration both the counters κ[v0] and κ[v1] are nonzero.That is, to check 1-agreement, in formula (8) we set:

Experimental evaluation
The algorithms that we model using STA and verify by bounded model checking are designed for different fault models, which in our case are crashes, send omissions or Byzantine faults.We now proceed by introducing our benchmarks.Their encodings, together with the implementations of the procedures for finding the diameter and applying bounded model checking, are available at [40].Algorithms without a Clean Round Assumption.We consider three variants of the synchronous reliable broadcast algorithm, whose STA is monotonic and 1-cyclic (i.e., Theorem 2 applies).These algorithms assume different fault models: rb, [23] (Fig. 1): reliable broadcast with at most t Byzantine faults; -rb_hyb, [29]: reliable broadcast with at most t hybrid faults: at most b Byzantine and at most s send omissions, with t = b + s; -rb_omit, [29]: reliable broadcast with at most t send omissions.
These algorithms have a structure similar to the one depicted in Fig. 2, with the exception of phase_king, phase_queen, and their variants from [29].Their loop body consists of several message exchange steps, which correspond to multiple rounds, grouped in a phase.In each phase, a designated process acts as a coordinator.Specifications.We call correct a process that is non-faulty, and obedient a process that is either correct, or performs a send omission, that is, a process that works correctly on the receiving side.The safety properties that we check for our benchmarks are presented in Table 2.We also give their formalization as LTL formulas over counter atoms.
Computing the Diameter.We implemented the procedure from Sect.5.1 in Python.The implementation uses a backend SMT solver (currently, Z3 and CVC4).Our tool computed diameter bounds for all of our benchmarks, except for one, even for those for which we do not have a theoretical guarantee.Our experiments reveal extremely low values for the diameter that range between 1 and 8.The values for the diameter and the time needed to compute them are presented in Table 3.We ran all our experiments on a machine with Intel(R) Core(TM) i7 CPU and 16GB of RAM, with z3-4.8.7 and cvc4-1.7.We set the Validity Consensus, k-set agreement Every value that is not an initial value of a process is not a value that is decided upon.
For consensus and 1-set agreement, we have V = {0, 1} and k = 1.For 2-set agreement, we have V = {0, 1, 2} and k = 2.The flag decided is true when the algorithm has ran for t/k + 1 rounds.For agreement, we check that once decided is true, there are no processes in at least one location vi, for v i ∈ V timeout for both solvers to 24 hours.Computing the diameter using this configuration timed out for two benchmarks (king_BSW_hyb and queen_BSW_hyb).Thus, we ran the diameter procedure for these benchmarks on a machine with more computing power.On this machine, we were able to obtain the diameter 8 for king_BSW_hyb with Z3 in 4 days.However, for queen_BSW_hyb, where d = 6, we were not able to obtain an answer from either solver for the negation of (4) within one week.Hence, we cannot conclude whether the diameter of queen_BSW_hyb is 6 or higher.Checking the Algorithms.We have implemented another Python function which encodes violations of the safety properties as reachability properties on paths of bounded length, as described in Sect.6, and uses a back-end SMT solver to check their satisfiability.Table 3 contains the results that we obtained by checking reachability for our benchmarks, using the diameter bound computed using the procedure from Sect.5.1, and diameter bound from Theorem 2, for algorithms whose STA is monotonic and 1-cyclic.
To our knowledge, we are the first to verify the listed algorithms that work with send omission, Byzantine and hybrid faults.For the algorithms with crash faults, our approach is a significant improvement to the results obtained using the abstraction-based method from [35].Counterexamples.Our tool found a bug in the version of the phase_king algorithm that was given in [31], which was corrected in the version of the algorithm in [30].The version from [31] had the wrong threshold '> n − t' in one guard, while the one in [30] had '≥ n − t' for the same guard.Although this fix was not due to the result produced by our tool, we believe that it is noteworthy, as it shows that our tool can quickly produce counterexamples.This motivated us to test our tool further and apply it to erroneous encodings of our benchmarks, which we produced.For rb, rb_hyb, rb_omit, phase_king, and phase_queen, we tweaked the resilience condition and introduced more faults than expected by the algorithm, e.g., by setting f > t (instead of f ≤ t) in the STA in Fig. 1.For fair_cons, floodmin, floodset, and floodmin_omit, we checked executions without a clean round.For all of the erroneous encodings, our tool produces counterexamples in seconds.SMT Solvers.In our evaluation, we used Z3 and CVC4 as back-end SMT solvers.To obtain the results presented in Table 3, we ran both solvers with their default configurations.We observed that on our benchmarks, Z3 generally performs better than CVC4.
When computing the diameter, as an input to the SMT solvers, we give an SMT-LIB file that encodes the negation of the diameter query (4).The negation of ( 4) is a formula with one quantifier alternation, ∃∀, and is encoded as explained in Sect.5.1.Note that in our encoding, the negation of ( 4) is already in Skolem normal form, as we introduce constants for the existentially quantified variables.
The size of the SMT-LIB files is proportional to the number of locations, rules, guards, and the diameter.For example, the SMT-LIB file for the simplest benchmark, floodmin_omit, for k = 1 and d = 1, has 184 lines of code, while the most complicated benchmark, king_BSW_hyb for d = 8, has 11691 lines of code.
We tried to understand how the SMT solvers deal with the negation of (4).By disabling the option cbqi, which stands for counterexample-based quantifier instantiation [41], CVC4 was not able to solve the diameter query for any of our benchmarks.In the case of Z3, even after disabling the options mbqi and ematching, which stand for modelbased quantifier instantiation and E-matching, respectively, it was able to solve the diameter queries for all our benchmarks.By enabling verbose output while running Z3, the solver reports that it uses the qsat procedure [42], which is a quantified satisfiability algorithm developed for linear arithmetic, and which is based on techniques for solving quantified Boolean formulas (QBF).RC are the number of locations, rules, atomic guards, and resilience condition in each STA; d is the diameter computed using SMT, c is the longest chain of the algorithms whose STA is monotonic and 1-cyclic; τ is the time to compute the diameter using SMT; T , SMT is the time to check reachability using the diameter computed using the SMT procedure from Sect.5.1; T , Thm. 2 the time to check reachability using the bound obtained by Theorem 2. For the cases where Theorem 2 is not applicable, we write (-).The experiments were run on a machine with Intel(R) Core(TM) i7 CPU and 16GB of RAM, using z3-4.8.

Discussion and related work
Parameterized verification of synchronous and partially synchronous distributed algorithms has recently gained attention.In both models, distributed computations are organized in rounds and processes (conceptually) move in lockstep.For consensus algorithms in the partially synchronous model, the authors of [8] introduced a consensus logic and (semi-)decision procedures.Later, the authors of [43] introduced a language for partially synchronous consensus algorithms and proved cut-off theorems specialized to the properties of consensus: agreement, validity, and termination.Concerning synchronous algorithms, the authors of [35] introduced an abstraction-based model checking technique for crash-tolerant synchronous algorithms with existential guards.In contrast to their work, we allow more general guards that contain linear expressions over the parameters, e.g., n − t.Our method offers more automation, and our experimental evaluation shows that our technique is faster than the technique [35].We introduce a synchronous variant of threshold automata, which were proposed in [20] for asynchronous algorithms.Several extensions of this model were recently studied in [44], but the synchronous case was not considered.In the STA we introduced in this paper, we defined the guards over the number of the (globally) sent messages.An extension of STA that allows expressing guards over the (locally) received messages was recently proposed in [45].The STA with receives variables from [45] is a formal model of the process behavior that is closer to the pseudocode and enables automation of the implicit reasoning step that captures the relationship between the sent and received messages when encoding the guards.In this paper, we encoded the guards of our benchmarks manually.
STA extends the guarded protocols by [46], in which a process can check only if a sum of counters is different from 0 or n.Generalizing the results from [46] to STA is not straightforward.In [47], safety of finite-state transition systems over infinite data domains was reduced to backwards reachability checking using a fixpoint computation, as long as the transition systems are well structured.It would be interesting to put our results in this context.A decidability result for liveness properties of parameterized timed networks was obtained in [48], employing linear programming for the analysis of vector addition systems with a parametric initial state.We plan Fig. 4 A counter automaton for the STA in Fig. 1, with φ 0 ≡ x < t +1, φ 1 ≡ x + f ≥ t + 1, φ 2 ≡ x + f ≥ n − t, φ 3 ≡ x < n − t, where x counts the number of processes in locations v1, se, ac; and n, t, f are counters for the parameters.On a path from s 0 to s 7 , the counters ∈ {v0, v1, se, ac} are emptied, while the counters n are populated.This models the transitions from one location to another in the current round to investigate the use of similar ideas for analyzing liveness properties of STA.
The 1-cyclicity condition is reminiscent of flat counter automata [49].In Fig. 4, we show a possible translation of an STA to a counter automaton (similar to the translation for asynchronous threshold automata from [44]).We note that the counter automaton is not flat, due to the presence of the outer loop, which models a transition to the next round.By knowing a bound d on the diameter (e.g., by Theorem 2), one can flatten the counter automaton by unfolding the outer loop d times.We also experimented with FAST [50] on two of our benchmarks: rb and floodmin for k = 1, depicted in Figs. 1 and 2, respectively.FAST terminated on rb, but took significantly longer than our tool on the same machine (i.e., hours rather than seconds).FAST ran out of memory when checking floodmin.
Our experiments show that STA that is neither monotonic nor 1-cyclic still may have bounded diameters.Finding other classes of STA for which one could derive the diameter bounds is a subject of future work.Although we considered only reachability properties in this work-which happened to be challenging-we are going to investigate completeness thresholds for liveness in the future.
Recently, reductions [16,18] received increasing interest for automated verification of asynchronous fault-tolerant distributed algorithms.Several approaches [10][11][12]14,19] reduce verification of asynchronous algorithms to the verification different synchronized scenarios.Due to the nondeterminism that comes from faults and asynchrony, these synchronized versions have different fault and communication semantics to those considered in this paper.As future work, we would like to consider "synchronized" versions of asynchronous distributed algorithms.

Lemma 4
Let τ be a feasible schedule in a counter system induced by a 1-cyclic STA, and (P, m) its corresponding multiset of local runs of length |τ |.If |τ | > c, where c is the longest chain in the STA, then every local run ∈ P contains a rule r ∈ R such that r .from= r .to.Proof Suppose |τ | > c and that there exists a local run ∈ P, such that for every 0 < i ≤ k, it holds that [i].from = [i].to.Because c is the longest chain in the 1-cyclic STA, and since | | > c, it must be the case that there exist indices i, j, with 0

Lemma 5
Let τ be a steady feasible schedule in a counter system induced by a monotonic and 1-cyclic STA.If |τ | > c + 1, where c is the longest chain in the STA, then there exists a steady feasible schedule τ such that |τ | = |τ | − 1, and τ, τ have the same origin and goal.

Table 1
A long execution of reliable broadcast and the short representative.For this example, we assume t > 2 Definition 1 A counter system w.r.t. an admissible instance p ∈ P RC and an STA = (L, I, , R, RC, χ) is the tuple

Table 2
The properties that we check for our benchmarks

Table 3
Results for our available at [40]: |L|, |R|, | |, 7 and cvc4-1.7.The timeout, denoted by t.o. in the table, was set to 24 hours.The diameter bound for king_BSW_hyb was computed on a machine with more computing power.The bounded model checking queries for king_BSW_hyb were run on the machine with Intel(R) Core(TM) i7 CPU and 16GB of RAM