A Strategy for Automatic Verification of Stabilization of Distributed Algorithms

. Automatic veriﬁcation of convergence and stabilization properties of distributed algorithms has received less attention than veriﬁcation of invariance properties. We present a semi-automatic strategy for veriﬁcation of stabilization properties of arbitrarily large networks under structural and fairness constraints. We introduce a suﬃcient condition that guarantees that every fair execution of any (arbitrarily large) instance of the system stabilizes to the target set of states. In addition to specifying the protocol executed by each agent in the network and the stabilizing set, the user also has to provide a measure function or a ranking function. With this, we show that for a restricted but useful class of distributed algorithms, the suﬃcient condition can be automatically checked for arbitrarily large networks, by exploiting the small model properties of these conditions. We illustrate the method by automatically verifying several well-known distributed algorithms including link-reversal, shortest path computation, distributed coloring, leader election and spanning-tree construction.


Introduction
A system is said to stabilize to a set of states X * if all its executions reach some state in X * [1]. This property can capture common progress requirements like absence of deadlocks and live-locks, counting to infinity, and achievement of selfstabilization in distributed systems. Stabilization is a liveness property, and like other liveness properties, it is generally impossible to verify automatically. In this paper, we present sufficient conditions which can be used to automatically prove stabilization of distributed systems with arbitrarily many participating processes.
A sufficient condition we propose is similar in spirit to Tsitsiklis' conditions given in [2] for convergence of iterative asynchronous processes. We require the user to provide a measure function, parameterized by the number of processes, such that its sub-level sets are invariant with respect to the transitions and there is a progress making action for each state. 1 Our point of departure is a non-interference condition that turned out to be essential for handling models of distributed systems. Furthermore, in order to handle non-deterministic communication patterns, our condition allows us to encode fairness conditions and different underlying communication graphs.
Next, we show that these conditions can be transformed to a forall-exists form with a small model property. That is, there exists a cut-off number N 0 such that if the condition(s) is(are) valid in all models of sizes up to N 0 , then it is valid for all models. We use the small model results from [3] to determine the cutoff parameter and apply this approach to verify several well-known distributed algorithms.
We have a Python implementation based on the sufficient conditions for stabilization we develop in Section 3. We present precondition-effect style transition systems of algorithms in Section 4 and they serve as pseudo-code for our implementation. The SMT-solver is provided with the conditions for invariance, progress and non-interference as assertions. We encode the distributed system models in Python and use the Z3 theorem-prover module [4] provided by Python to check the conditions for stabilization for different model sizes.
We have used this method to analyze a number of well-known distributed algorithms, including a simple distributed coloring protocol, a self-stabilizing algorithm for constructing a spanning tree of the underlying network graph, a link-reversal routing algorithm, and a binary gossip protocol. Our experiments suggest that this method is effective for constructing a formal proof of stabilization of a variety of algorithms, provided the measure function is chosen carefully. Among other things, the measure function should be locally computable: changes from the measure of the previous state to that of the current state only depend on the vertices involved in the transition. It is difficult to determine whether such a measure function exists for a given problem. For instance, consider Dijkstra's self-stabilizing token ring protocol [5]. The proof of correctness relies on the fact that the leading node cannot push for a value greater than its previous unique state until every other node has the same value. We were unable to capture this in a locally computable measure function because if translated directly, it involves looking at every other node in the system.

Related Work
The motivation for our approach is from the paper by John Tsitsiklis on convergence of asynchronous iterative processes [2], which contains conditions for convergence similar to the sufficient conditions we state for stabilization. Our use of the measure function to capture stabilization is similar to the use of Lyapunov functions to prove stability as explored in [6], [7] and [8]. In [9], Dhama and Theel present a progress monitor based method of designing self-stabilizing algorithms with a weakly fair scheduler, given a self-stabilizing algorithm with an arbitrary, possibly very restrictive scheduler. They also use the existence of a ranking function to prove convergence under the original scheduler. Several authors [10] employ functions to prove termination of distributed algorithms, but while they may provide an idea of what the measure function can be, in general they do not translate exactly to the measure functions that our verification strategy can employ. The notion of fairness we have is also essential in dictating what the measure function should be, while not prohibiting too many behaviors. In [7], the assumption of serial execution semantics is compatible with our notions of fair executions.
The idea central to our proof method is the small model property of the sufficient conditions for stabilization. The small model nature of certain invariance properties of distributed algorithms (eg. distributed landing protocols for small aircrafts as in [11]) has been used to verify them in [12]. In [13], Emerson and Kahlon utilize a small model argument to perform parameterized model checking of ring based message passing systems.

Preliminaries
We will represent distributed algorithms as transition systems. Stabilization is a liveness property and is closely related to convergence as defined in the works of Tsitsiklis [2]; it is identical to the concept of region stability as presented in [14]. We will use measure functions in our definition of stabilization. A measure function on a domain provides a mapping from that domain to a well-ordered set. A well-ordered set W is one on which there is a total ordering <, such that there is a minimum element with respect to < on every non-empty subset of W . Given a measure function C : A → B, there is a partition of A into sub level-sets. All elements of A which map to the same element b ∈ B under C are in the same sub level-set L b .
We are interested in verifying stabilization of distributed algorithms independent of the number of participating processes or nodes. Hence, the transition systems are parameterized by N -the number of nodes. Given a non-negative integer N , we use [N ] to denote a set of indices {1, 2, . . . , N }. Definition 1. For a natural number N and a set Q, a transition system A(N ) with N nodes is defined as a tuple (X,A,D) where a) X is the state space of the system. If the state space of of each node is Q, A is a set of actions. c) D : X × A → X is a transition function, that maps a system-state action pair to a system-state.
For any x ∈ X , the i th component of x is the state of the i th node and we refer to it as x[i]. Given a transition system A(N ) = (X , A, D) we refer to the state obtained by the application of the action a on a state x ∈ X i.e, D(x, a), by a(x).
An execution of A(N ) records a particular run of the distributed system with N nodes. Formally, an execution α of A(N ) is a (possibly infinite) alternating sequence of states and actions x 0 , a 1 , x 1 , . . ., where each x i ∈ X and each a i ∈ A such that D(x i , a i+1 ) = x i+1 . Given that the choice of actions is nondeterministic in the execution, it is reasonable to expect that not all executions may stabilize. For instance, an execution in which not all nodes participate, may not stabilize.

Definition 2.
A fairness condition F for A(N ) is a finite collection of subsets of actions {A i } i∈I , where I is a finite index set. An action-sequence σ = a 1 , a 2 , . . . is F-Fair if every A i in F is represented in σ infinitely often, that is, For instance, if the fairness condition is the collection of all singleton subsets of A, then each action occurs infinitely often in an execution. This notion of fairness is similar to action based fairness constraints in temporal logic model checking [15]. The network graph itself enforces whether an action is enabled: every pair of adjacent nodes determines a continuously enabled action. An execution is strongly fair, if given a set of actions A such that all actions in A are infinitely often enabled; some action in A occurs infinitely often in the it. An F-fair execution is an infinite execution such that the corresponding sequence of actions is F-fair.
Definition 3. Given a system A(N ), a fairness condition F, and a set of states It is different from the definition of self-stabilization found in the literature [1], in that the stabilizing set X * is not required to be an invariant of A(N ). We view proving the invariance of X * as a separate problem that can be approached using one of the available techniques for proving invariance of parametrized systems in [3], [12]. Example 1. (Binary Gossip) We look at binary gossip in a ring network composed of N nodes. The nodes are numbered clockwise from 1, and nodes 1 and N are also neighbors. Each node has one of two states : {0, 1}. A pair of neighboring nodes communicates to exchange their values, and the new state is set to the binary Or (∨) of the original values. Clearly, if all the interactions happen infinitely often, and the initial state has at least one node state 1, this transition system stabilizes to the state x = 1 N . The set of actions is specified by the set of edges of the ring. We first represent this protocol and its transitions using a standard precondition-effect style notation similar to one used in [16].
The above representation translates to the transition system A(N ) = (X , A, D) where We define the stabilizing set to be X * = {1 N }, and the fairness condition is F = which ensures that all possible interactions take place infinitely often. In Section 3 we will discuss how this type of stabilization can be proven automatically with a user-defined measure function.

A Sufficient Condition for Stabilization
We state a sufficient condition for stabilization in terms of the existence of a measure function. The measure functions are similar to Lyapunov stability conditions in control theory [17] and well-founded relations used in proving termination of programs and rewriting systems [18].
D is a transition system parameterized by N , with a fairness condition F, and let X * be a subset of X . Suppose further that there exists a measure function C : X → W , with minimum element ⊥ such that the following conditions hold for all states x ∈ X: Proof. Consider an F-fair execution α = x 0 a 1 x 1 . . . of A(N ) and let x i be an arbitrary state in that execution. If C(x i ) = ⊥, then by minimality, we have x i ∈ X * . Otherwise, by the progress condition we know that there exists a set of actions A xi ∈ F and k > i, such that a k ∈ A xi , and C(a k (x i )) < C(x i ). We perform induction on the length of the sub-sequence The base case of the induction is n = 0, which is trivially true. By induction hypothesis we have: for any j < n, with length of β equal to j, We have to show that for any action b ∈ A, There are two cases to consider.
which implies that C(a(x )) < C(x ). By applying the induction hypothesis to x we have the required inequality C(a k (β(b(x i ))) < C(x i ). So far we have proved that either a state x i in an execution is already in the stabilizing set, or there is a state x k , k > i such that C(x k ) < C(x i ). Since < is a well-ordering on C(X ), there cannot be an infinite descending chain. Thus By minimality , x j ∈ X * . By invariance again, we have F-stabilization to X * We make some remarks on the conditions of Theorem 1. It requires the measure function C and the transition system A(N ) to satisfy four conditions. The invariance condition requires the sub-level sets of C to be invariant with respect to all the transitions of A(N ). The progress condition requires that for every state x for which the measure function is not already ⊥, there exists a fair set of actions A x that takes x to a lower value of C.
The minimality condition asserts that C(x) drops to ⊥ only if the state is in the stabilizing set X * . This is a part of the specification of the stabilizing set.
The noninterference condition requires that if a results in a decrease in the value of the measure function at state x, then application of a to another state x that is reachable from x also decreases the measure value below that of x. Note that it doesn't necessarily mean that a decreases the measure value at x , only that either x has measure value less than x at the time of application of a or it drops after the application. In contrast, the progress condition of Theorem 1 requires that for every sub-level set of C there is a fair action that takes all states in the sub-level set to a smaller sub-level set.
To see the motivation for the noninterference condition, consider a sub-level set with two states x 1 and x 2 such that b(x 1 ) = x 2 , a(x 2 ) = x 1 and there is only one action a such that C(a(x 1 )) < C(x 1 ). But as long as a does not occur at x 1 , an infinite (fair) execution x 1 bx 2 ax 1 bx 2 . . . may never enter a smaller sub-level set.
In our examples, the actions change the state of a node or at most a small set of nodes while the measure functions succinctly captures global progress conditions such as the number of nodes that have different values. Thus, it is often impossible to find actions that reduce the measure function for all possible states in a level-set. In Section 4, we will show how a candidate measure function can be checked for arbitrarily large instances of a distributed algorithm, and hence, lead to a method for automatic verification of stabilization.

Automating Stabilization Proofs
For finite instances of a distributed algorithm, we can use formal verification tools to check the sufficient conditions in Theorem 1 to prove stabilization. For transition systems with invariance, progress and noninterference conditions that can be encoded appropriately in an SMT solver, these checks can be performed automatically. Our goal, however, is to prove stabilization of algorithms with an arbitrary or unknown number of participating nodes. We would like to define a parameterized family of measure functions and show that ∀N ∈ N, A(N ) satisfies the conditions of Theorem 1. This is a parameterized verification problem and most of the prior work on this problem has focused on verifying invariant properties (see Section 1 for related works). Our approach will be based on exploiting the small model nature of the logical formulas representing these conditions.
Suppose In [3], a class of ∀∃ formulas with small model properties were used to check invariants of timed distributed systems on arbitrary networks. In this paper, we will use the same class of formulas to encode the sufficient conditions for checking stabilization. We use the following small model theorem as presented in [3]: where φ is a quantifier-free formula involving the index variables, global and local variables in the system. Then, ∀N ∈ N : Γ (N ) is valid iff for all n ≤ N 0 = (e + 1)(k + 2), Γ (n) is satisfied by all models of size n, where e is the number of index array variables in φ and k is the largest subscript of the universally quantified index variables in Γ (N ).

Computing the Small Model Parameter
Computing the small model parameter N 0 for verifying a stability property of a transition system first requires expressing all the conditions of Theorem 1 using formulas which have the structure specified by Theorem 2. There are a few important considerations while doing so.
Translating the sufficient conditions In their original form, none of the conditions of Theorem 1 have the structure of ∀∃-formulas as required by Theorem 2. For instance, a leading ∀x ∈ X quantification is not allowed by Theorem 2, so we transform the conditions into formulas with implicit quantification. Take for instance the invariance condition: ∀x ∈ X , ∀a ∈ A, (C(a(x)) ≤ C(x)). Checking the validity of the invariance condition is equivalent to checking the satisfiability of ∀a ∈ A, (a(x) = x ⇒ C(x ) ≤ C(x)), where x and x are free variables, which are checked over all valuations. Here we need to check that x and x are actually states and they satisfy the transition function. For instance in the binary gossip example, we get Interaction graphs In distributed algorithms, the underlying network topology dictates which pairs of nodes can interact, and therefore the set of actions. We need to be able to specify the available set of actions in a way that is in the format demanded by the small-model theorem. In this paper we focus on specific classes of graphs like complete graphs, star graphs, rings, k-regular graphs, and k-partite complete graphs, as we know how to capture these constraints using predicates in the requisite form. For instance, we use edge predicates E(i, j) : i and j are node indices, and the predicate is true if there is an undirected edge between them in the interaction graph. For a complete graph, E(i, j) = true. In the Binary Gossip example, the interaction graph is a ring, and E(i, j) If the graph is a d-regular graph, we express use d arrays, reg 1 , . . . , reg d , where ∃i, reg i [k] = l if there is an edge between k and l, and . This only expresses that the degree of each vertex is d, but there is no information about the connectivity of the graph. For that, we can have a separate index-valued array which satisfies certain constraints if the graph is connected. These constraints need to be expressed in a format satisfying the small model property as well. Other graph predicates can be introduced based on the model requirements, for instance, P arent(i, j), Child(i, j), Direction(i, j). In our case studies we verify stabilization under the assumption that all pairs of nodes in E interact infinitely often. For the progress condition, the formula simplifies to ∃a ∈ A, C(x) = ⊥ ⇒ C(a(x)) < C(x)). More general fairness constraints can be encoded in the same way as we encode graph constraints.

Case studies
In this section, we will present the details of applying our strategy to various distributed algorithms. We begin by defining some predicates that are used in our case studies. Recall that we want wanted to check the conditions of Theorem 1 using the transformation outlined in Section 3.3 involving x, x etc., representing the states of a distributed system that are related by the transitions. These conditions are encoded using the following predicates, which we illustrate using the binary gossip example given in Section 2: -isState(x) returns true iff the array variable x represents a state of the system. In the binary gossip example, -isAction(a) returns true iff a is a valid action for the system. Again, for the binary gossip example isAction(step(i, j)) = True for all i, j ∈ [N ] in the case of a complete communication graph. -isTransition(x, step(i, j), x ) returns true iff the state x goes to x when the transition function for action step(i, j) is applied to it. In case of the binary gossip example, isTransition(x, step(i, j), x ) is -Combining the above predicates, we define P (x, x , i, j) as , j)).

Graph Coloring
This algorithm colors a given graph in d + 1 colors, where d is the maximum degree of a vertex in the graph [10]. Two nodes are said to have a conflict if they have the same color. A transition is made by choosing a single vertex, and if it has a conflict with any of its neighbors, then it sets its own state to be the least available value which is not the state of any of its neighbours. We want to verify that the system stabilizes to a state with no conflicts. The measure function is chosen as the set of pairs with conflicts.
Here, the ordering on the image of the measure function is set inclusion.
(E is the set of edges in the underlying graph) (from (3 and expansion of ordering) From the above conditions, using Theorem 2 N 0 is calculated to be 24.

Leader Election
This algorithm is a modified version of the Chang-Roberts leader election algorithm [10]. We apply Theorem 1 directly by defining a straightforward measure function. The state of each node in the network consists of a) its own uid, b) the index and uid of its proposed candidate, and c) the status of the election according to the node (0 : the node itself is elected, 1 : the node is not the leader, 2 : the node is still waiting for the election to finish). A node i communicates its state to its clockwise neighbor j (i + 1 if i < N , 0 otherwise) and if the UID of i's proposed candidate is greater than j, then j is out of the running. The proposed candidate for each node is itself to begin with. When a node gets back its own index and uid, it sets its election status to 0. This status, and the correct leader identity propagates through the network, and we want to verify that the system stabilizes to a state where a leader is elected. The measure function is the number of nodes with state 0. The function Sum() represents the sum of all elements in the array, and it can be updated when a transition happens by just looking at the interacting nodes. We encode the sufficient conditions for stabilization of this algorithm using the strategy outlined in Section 3.2. (one element still waiting for election to end) Noninterference : ∀q, r, s, t ∈ [N ], P (x, x , q, r) ∧ P (x, x , s, t) ∧ P (x , x , q, r) ).
(expanding out Sum) From the above conditions, using Theorem 2, N 0 is calculated to be 35.

Shortest path
This algorithm computes the shortest path to every node in a graph from a root node. It is a simplified version of the Chandy-Misra shortest path algorithm [10].
We are allowed to distinguish the nodes with indices 1 or N in the formula structure specified by Theorem 2. The state of the node represents the distance from the root node. The root node (index 1) has state 0. Each pair of neighboring nodes communicates their states to each other, and if one of them has a lesser value v, then the one with the larger value updates its state to v + 1. This stabilizes to a state where all nodes have the shortest distance from the root stored in their state. We don't have an explicit value of ⊥ for the measure function for this, but it can be seen that we don't need it in this case. Let the interaction graph be a d−regular graph. The measure function is the sum of distances. Ordering on the image of measure function is the usual one on natural numbers.
where the graph is d-regular.

Link Reversal
We describe the full link reversal algorithm as presented by Gafni and Bertsekas in [19], where, given a directed graph with a distinguished sink vertex, it outputs a graph in which there is a path from every vertex to the sink. There is a distinguished sink node(index N). Any other node which detects that it has only incoming edges, reverses the direction of all its edges with its neighbours. We use the vector of reversal distances (the least number of edges required to be reversed for a node to have a path to the sink, for termination. The states store the reversal distances, and the measure function is identity.
The ordering on the image of the measure function is component-wise comparison: We mentioned earlier that the image of C has a well-ordering. That is a condition formulated with the idea of continuous spaces in mind. The proposed ordering for this problem works because the image of the measure function is discrete and has a lower bound (specifically, 0 N ). We elaborate a bit on P here, because it needs to include the condition that the reversal distances are calculated accurately. The node N has reversal distance 0. Any other node has reversal distance rd(i) = min(rd(j 1 ), . . . rd(j m ), rd(k 1 ) + 1, . . . rd(k n ) + 1) where j p (p = 1 . . . m) are the nodes to which it has outgoing edges, and k q (q = 1 . . . n) are the nodes it has incoming edges from. P also needs to include the condition that in a transition, reversal distances of no other nodes apart from the transitioning nodes change.
From the above conditions, using Theorem 2, N 0 is calculated to be 21.

Experiments and Discussion
We verified that instances of the aforementioned systems with sizes less than the small model parameter N 0 satisfy the four conditions(invariance, progress, non-interference, minimality) of Theorem 1 using the Z3 SMT-solver [4]. The models are checked by symbolic execution. The interaction graphs were complete graphs in all the experiments. In Figure 5, the x-axis represents the problem instance sizes, and the y-axis is the log of the running time (in seconds) for verifying Theorem 1 for the different algorithms. 2 We observe that the running times grow rapidly with the increase in the model sizes. For the binary gossip example, the program completes in ∼ 17 seconds for a model size 7, which is the N 0 value. In case of the link reversal, for a model size 13, the program completes in ∼ 30 mins. We have used complete graphs in all our experiments, but as we mentioned earlier in Section 3.2, we can encode more general graphs as well. This method is a general approach to automated verification of stabilization properties of distributed algorithms under specific fairness constraints, and structural constraints on graphs. The small model nature of the conditions to be verified is crucial to the success of this approach. We saw that many distributed graph algorithms, routing algorithms and symmetry-breaking algorithms can be verified using the techniques discussed in this paper. The problem of finding a suitable measure function which satisfies Theorem 2, is indeed a non-trivial one in itself, however, for the problems we study, the natural measure function of the algorithms seems to work.