Up-To Techniques for Generalized Bisimulation Metrics

Bisimulation metrics allow us to compute distances between the behaviors of probabilistic systems. In this paper we present enhancements of the proof method based on bisimulation metrics, by extending the theory of up-to techniques to (pre)metrics on discrete probabilistic concurrent processes. Up-to techniques have proved to be a powerful proof method for showing that two systems are bisimilar, since they make it possible to build (and thereby check) smaller relations in bisimulation proofs. We deﬁne soundness conditions for up-to techniques on metrics, and study compatibility properties that allow us to safely compose up-to techniques with each other. As an example, we derive the soundness of the up-to-bisimilarity-metric-and-context technique. The study is carried out for a generalized version of the bisimulation metrics, in which the Kantorovich lifting is parametrized with respect to a distance function. The standard bisimulation metrics, as well as metrics aimed at capturing multiplicative properties such as diﬀerential privacy, are speciﬁc instances of this general deﬁnition.


Introduction
Bisimulation has played a fundamental role in the analysis and verification of traditional concurrent systems.In recent times, however, there is a growing tendency to consider probabilistic frameworks, partly to capture the random nature of interactions in distributed systems, partly to model and reason about protocols which make use of randomized mechanisms, such as those used in security and privacy.In this context, equivalences are not suitable, because they are not robust w.r.t.small variation of the transition probabilities, and they are usually replaced by (pseudo-)metrics: unlike an equivalence relation, a metric can vary smoothly as a function of the probabilities, and it can be used to measure the similarity of two systems in a more informative way than an equivalence relation.
Bisimulation metrics are particularly successful, especially in the area of concurrency, They can be defined by generalizing to metrics the bisimilarity "progress" relation; using a terminology introduced by Sangiorgi [12], we say that a relation between processes R progresses to S if for every pair of processes in R, every transition from one process is matched by a transition from the other, and the derivative processes are related by S. A bisimulation can then be defined as a relation that progresses to itself.Using the same terminology for probabilistic transitions, a metric d on states progresses to a metric l on distributions over performing action c.Process A behaves similarly to A, but with probability one fourth it performs action d instead of c.In order to prove that bm(A, A ) ≤ 1  2 , we should define a metric assigning distance one half not only to the pair (A, A ), but also to all pairs of the form A | b n and A | b n , where b n is the parallel composition of n instances of b, representing the pairs to be inspected after the action a is performed for the n-th time.Each of these pairs should then be proved to satisfy the bisimulation metric clauses.Using up-to techniques, we can prove that bm(A, A ) ≤ 1  2 just by considering a (pre)metric assigning one half distance to (A, A ), and maximal distance to all other non-identical states.When A performs a, then A replies with the same action and the (probabilistic) up-to-context technique guarantees that it is sound to directly use the distance on (A, A ) in place of the distance on (A | b, A | b).
Plan of the paper.Section 2 recalls some preliminary notions.Section 3 introduces some operators on premetrics and discusses some relevant properties of them.Section 4 presents the extension to metrics of the up-to techniques.Section 5 shows some examples of these techniques applied to probabilistic CCS and to the verification of differential privacy.Finally, Section 6 concludes.Some proofs were omitted for space reasons, they can be found in the report version of this paper [4].

Premetrics and metrics
An (extended) premetric on a set X is a very relaxed form of metric, namely a function m : X 2 → [0, +∞] satisfying only reflexivity (m(x, x) = 0).An (extended, pseudo) metric d on X is a premetric also satisfying symmetry (d(x, y) = d(y, x)) and the triangle inequality (d(x, z) ≤ d(x, y) + d(y, z)).For simplicity we drop "extended" and "pseudo" but they are always implied; we denote by M(X), M d (X) the set of premetrics and metrics on X respectively.The kernel ker(m) of m is an equivalence relation on X relating elements at distance 0, i.e. (x, y) ∈ ker(m) iff m(x, y) = 0. Premetrics M(X) bounded by some maximal distance ∈ [0, ∞] form a complete lattice under element-wise ordering (m ≤ m iff m(x, y) ≤ m (x, y) for all x, y), with suprema and infima given by ( A)(x, y) = sup m∈A m(x, y) and ( A)(x, y) = inf m∈A m(x, y).Note that the lattice depends on the choice of -the value (possibly +∞) assigned by the top premetric M(X) to all distinct elements -which we generally leave implicit.
Metrics M d (X) bounded by also form a complete lattice under ≤, with the same supremum operator.On the other hand, the infimum operator, denoted by d , is different since the inf of metrics is not necessarily a metric.Still, infima exist and can be obtained by

Probabilistic automata, bisimilarity and metrics
Let S be a countable set of states. 1 We denote by P(S) the set of all (discrete) probability measures ∆, Θ over S; the Dirac measure on s by δ(s).A Probabilistic automaton (henceforth PA) A is a tuple (S, A, D) where A is a countable set of action labels, and D ⊆ S × A × P(S) is a transition relation.We write s α −→ ∆ for (s, α, ∆) ∈ D, and define a family of functions 1 A countable state space is assumed for simplicity; however, the proofs of several results do not rely on this assumption, and we expect those that do to be extendible to the continuous case.
C O N C U R 2 0 1 6

Up-To Techniques for Generalized Bisimulation Metrics
Let R ⊆ S × S be an equivalence relation on S; its lifting L(R) is an equivalence relation on P(S), defined as (∆, Θ) ∈ L(R) iff ∆, Θ assign the same probability to all equivalence classes of R. Probabilistic bisimilarity ∼ can be defined as the largest equivalence relation R on S such that (s, t) ∈ R and Bisimilarity is a strong notion that often fails in probabilistic systems due to some "small" mismatch of probabilities.Hence, it is natural to define a metric that tells us "how much" different two states are, and such that its kernel coincides with ∼.Let K : M d (S) → M d (P(S)) be a lifting operator mapping metrics on S to metrics on distributions over S. A well known such operator is the Kantorovich lifting, but it is not unique: in fact, the Kantorovich itself can be generalized to a family of liftings, parametrized by an underlying distance (c.f.Section 3.2).
A metric 2 The bisimilarity metric bm can be defined as the d of all bisimulation metrics.Note that the lattice order of metrics has inverse meaning than the one of relations: a smaller metric corresponds to a larger relation.
It should be emphasized that, although ∼ is a uniquely defined relation, bm depends first on the choice of and second, on the choice of the K operator.If K, L commute with ker, i.e. ker(K(d)) = L(ker(d)) for all d ∈ M d (S), it can be shown that ∼ = ker(bm) [3].In other words, we can have different metrics, all characterizing bisimilarity at their kernel, but which do not coincide on the distance they assign to non-bisimilar states.
Note that, although ∼ was defined as the union of all equivalence relations satisfying the bisimulation property, the "equivalence" requirement is only for convenience, so that the lifting L(R) has a simple form; we could obtain the same ∼ as the union of all arbitrary relations R satisfying the same property.The same is true for bm: although in the literature it is typically defined as the d of bisimulation metrics, we show in Section 4.1 that it can be constructed as the of bisimulation premetrics.The advantage of using premetrics (resp.arbitrary relations) is that one has to construct a simpler bisimulation premetric m (resp.bisimulation relation R) not necessarily satisfying the triangle inequality (resp.transitivity), in order to bound the bisimilarity distance between two states.

Premetrics: operations and their properties
In this section we discuss various operations on premetrics and their properties.These will provide the technical building blocks for developing the up-to techniques in Section 4.

Lipschitz property and reverse maps
Lipschitz is a fundamental strong notion of continuity that plays a central role in all constructions of this work.A function f : Proposition 1.The following hold: Note that, from the first property above, we have that

Generalized Kantorovich lifting
To construct metrics for probabilistic systems, as described in Section 2, one needs to lift (pre)metrics on the state space S to (pre)metrics on P(S).One well known such lifting is the Kantorovich metric, defined either via Lipschitz functions, or dually as a transportation problem.In [3] a generalization of this construction is given by extending the range of Lipschitz functions from (R, A function f : S → V can be lifted to a function f : P(S) → V by taking expectations: f (∆) = S f d∆.The requirement that V is convex ensures that f (∆) ∈ V .Then, given a premetric m ∈ M(S), we can define a lifted metric K(m) ∈ M(P(S)) as: The lifting K depends on the choice of (V, d V ) that we generally leave implicit: many results are given for any member of the family, while some state specific conditions on d V .Note the difference between m, the premetric being lifted, and d V , a parameter of the construction.Using the construction of Section 2, each member of the family gives rise to a different bisimilarity metric bm, and under mild assumptions it can be shown that all of them characterize bisimilarity at their kernel [3]. 3  Of particular interest is the classical Kantorovich K ⊕ , corresponding to (V, The corresponding bisimilarity metric obtained from the classical Kantorovich has been extensively studied; an important property of it is that bm(s, t) is a bound on the total variation distance between the trace distributions originated from states s, t (a quantitative analogue of the fact that bisimilarity implies trace equivalence).The multiplicative Kantorovich provides the same bound, but for the multiplicative total variation distance, a metric of central importance to the area of differential privacy.Hence, the multiplicative variant provides a means for verifying privacy for concurrent systems.
Somewhat unexpectedly, it turns out that K(m) is a proper metric, even if m itself is only a premetric: the metric properties of K(m) come from those of d V .

K(m) ∈ M d (S) (a proper metric) for all premetrics m ∈ M(S).
Another interesting property of K concerns its relationship with f ← .Given f : A → B, let f * : P(A) → P(B) denote the function mapping ∆ to its pushforward measure, given by 3 Note that these "mild assumptions" are orthogonal to the results of this paper.If they are not satisfied, ker(bm) might be strictly included in ∼, without violating any of our results.
C O N C U R 2 0 1 6

35:6
Up-To Techniques for Generalized Bisimulation Metrics Then, we can map metrics in M(B) to those in M(P(A)) by either applying f ← followed by K, or applying K followed by f * ← . The two options are related by the following result: ). Due to the above result, K can be shown to preserve the Lip property (c.f.Section 3.4), which in turn is crucial for establishing the soundness of the up-to context techniques.
Dual form on premetrics.The classical Kantorovich lifting can be dually expressed as a transportation problem.The primal and dual formulations are well-known to coincide on metrics; however, this is no longer the case when we work on premetrics.To see this, notice that in the transportation problem, the distance K d (m)(δ(s), δ(t)) (where K d denotes the dual Kantorovich) between two point distributions is exactly m(s, t), in other words . On the other hand, K(m) is always a metric, and it can be shown that δ ← • K gives the metric closure operator.
Note that the dual forms of both the classical and the multiplicative Kantorovich are particularly useful since, in contrast to the primal form, they provide direct algorithms for computing the distance between finite distributions.Since the two forms no longer coincide, we should ensure that both of them are sound when used in the up-to techniques.For a general Kantorovich lifting K, let K d be a monotone lifting that coincides with K on metrics.It can be shown that K d (m) ≤ K(m) for all premetrics m, which in turn means that replacing K with K d in the up-to techniques of Section 4 is sound.

Metric closure and chaining
A metric can be thought of as a generalization of an equivalence relation, since it satisfies reflexivity, symmetry and transitivity (in the form of the triangle inequality).Similarly to the equivalence closure, it is natural to define the metric closure m of m: intuitively, the goal is to decrease m just enough to enforce the metric properties.Since M d is a complete lattice, m can be naturally defined as the greatest metric below m: It can be shown that m → m is a closure operator whose fixpoints are exactly M d (S).
Let M denote the set {m | m ∈ M }.We can show that metric closure commutes with the infima of the two lattices.
This, in turn, means that the metric infimum d can be obtained by the premetric infimum followed by metric closure, that is: Based on this, we extend the d operator to premetrics, defined as d M = ( M ) .
Finally, we can define the chaining m 1 m 2 of two premetrics as: Chaining combines two premetrics by passing through some midway point, and will be used as a primitive block for constructing up-to techniques in Section 4.
is associative and monotone on both arguments

Operations that preserve Lipschitz
The Lipschitz property plays a central role in all constructions of this work, since both the Kantorovich lifting and the notion of progression depend on it.The following operations preserving this property will play a crucial role in the up-to techniques developed in Section 4.

Theorem 1. Let
The following hold: 1. Inc/dec-reasing the source/target metric: Note that the property (3) above implies that K(m) = K(m ) since the sup in the definition of K for both sides ranges over the same set of functions.

Convex and quasiconvex premetrics
If X is a convex set then X 2 can be also viewed as a convex set of vectors (x, y), where i λ i (x i , y i ) = ( i λ i x i , i λ i y i ) for all λ i 's such that i λ i = 1.This allows us to talk about the convexity of a premetric jointly on both arguments.We say that m ∈ M(X) is: ≤ max i m(x i , y i ) Note that there exist several distinct abstract notions of convexity for general metric spaces, here (quasi)convexity is used in the usual sense of (quasi)convex functions.
The set P(S) is convex and so is V used in the construction of the Kantorovich lifting.It can be shown that if d V is convex (resp.quasiconvex) then K(m) is also convex (resp.quasiconvex) for all m ∈ M(S).As a consequence, the classical Kantorovich K ⊕ (m) is convex (since | • | is convex), while the multiplicative variant K ⊗ (m) is quasiconvex (since d ⊗ is quasiconvex).

Up-to techniques
In this section, we extend to the metric case the theory of up-to techniques presented in [12].All the constructions assume some fixed underlying PA, which could be produced by a process calculus like the probabilistic CCS of Section 5.In what follows, we use l to denote premetrics on P(S).

Progressions
For a relation R on states of a non-probabilistic automaton, bisimulation can be defined in terms of progressions.A relation R progresses to R , denoted by R R , if whenever s R t and s α −→ s then t α −→ t and s R t , and vice versa.A bisimulation can be thereby defined as a relation that progresses to itself, i.e.R R.An important difference in the probabilistic case is that progressions have different source and target domains.A premetric m on S (the source premetric) progresses to a premetric l on P(S) (the target premetric).From the definition of bisimulation (pre)metrics (Section 2), we have that m ∈ M(S) is a bisimulation (pre)metric iff m K(m).The bisimilarity metric is traditionally defined as the d of all bisimulation metrics.Since metric closure preserves the Lip property, it also preserves the bisimulation property, which means that we can equivalently obtain bm as the of all bisimulation premetrics.

Theorem 3. m is a bisimulation premetric iff m is a bisimulation metric. Hence: bm
Assuming that m is a bisimulation premetric, we have that

F functions, soundness, respectfulness
We can define an up-to technique using a function F on M(P(S)).Ideally, for a premetric m on states, we want to allow the distance F(K(m))(∆, Θ) to be used instead of K(m)(∆, Θ) in a bisimulation proof, since a bound to F(K(m)) could be easier to compute.Therefore, we consider progressions of the form m F(K(m)), where F : M(P(S)) → M(P(S)).

Definition 4. A function F : M(P(S)) → M(P(S)
) Hence, if F is a sound function then a bisimulation premetric up-to F allows us to derive upper-bounds to the distance between two states.At the same time, using F in the target metric allows us to simplify the proof that the states actually satisfy these bounds.
Respectful functions.Given a function F : M(P(S)) → M(P(S)), one can prove that it is a sound up-to technique by means of a direct proof.However, it is known that the composition of sound functions on relations is not necessarily a sound function, and the standard counterexamples apply to the metric setting as well.In the non-probabilistic case, this has led to the definition of "respectfulness": an up-to function F on relations is respectful if whenever R R and R ⊆ R , then F(R) F(R ) and F(R) ⊆ F(R ).Respectfulness implies soundness and at the same time is closed under composition [12].On metrics, the definition of respectfulness must take care of the fact that the source and target metrics have different domains, and that the function F is defined on the domain P(S) of the target metric.Hence, a "corresponding" function G : M(S) → M(S) on the source metric has to be defined.Instead of constructing a specific such G, we only assume its existence and that it "plays well" with F and K, meaning that (K A concrete G is then chosen in the respectfulness proof of each up-to technique F. Definition 5. A function F : M(P(S)) → M(P(S)) is respectful iff it is monotone and there exists G : M(S) → M(S) such that for all m, m ∈ M(S): Theorem 6.Any respectful function is sound.
Proof.Let F be respectful and let G be its corresponding source map from the definition of respectfulness.Assume that m F(K(m)).Analogously to the proof in [12], we define a sequence of metrics m n , n ≥ 0 as: m 0 = m and m n+1 = G(m n ) ∧ m n .By construction, m n ≥ m n+1 for all n ≥ 0. We now show that m n K(m n+1 ) for all n ≥ 0 For the base case n = 0, from the respectfulness of F and the monotonicity of . For the inductive step, we want to show that We have that: Since progressions are closed under infima, n≥0 m n K( n≥0 m n ).Hence, n≥0 m n is a bisimulation metric, and m ≥ n≥0 m n , which concludes the proof.

Composing up-to techniques
The advantage of the respectfulness condition is that it makes it possible to derive the soundness of a composed up-to function just by proving the respectfulness of its components.We present here three operations that preserve respectfulness: function composition, function chaining, and taking the infimum of a set of functions (these operations respectively correspond to composition, chaining and union in the relational case).

Theorem 7. The composition of respectful functions is respectful.
The theorem is proved by showing that, given two respectful functions F 1 , F 2 and their corresponding source maps G 1 , G 2 from the definition of respectfulness, The chaining of up-to functions is defined using the operator from Section 4.2.1.Define the chaining of two functions Using the properties of proved in Proposition 5, we derive the following result.

Theorem 8. The chaining of respectful functions is respectful.
Analogously to chaining, define the infimum of a countable set of functions {F i } as {F i }(m) = {F i (m)}.Given a countable set {F i } of respectful functions with corresponding source maps {G i }, we prove that the function {F i } is respectful by using the source map {G i }.

Up-to bisimilarity metric and up-to (quasi)convexity
The respectfulness (and soundness) of up-to techniques such as up-to-bisimilarity-metric can now be recovered by applying the operations presented in Section 4.2.1 to basic respectful functions.
Theorem 10.The identity F id (l) = l and the constant-to-bm F bm (l) = K(bm) functions are respectful.
The result directly follows from the definition: for the first we take G id (m) = m, for the second G bm (m) = bm.The up-to-bisimilarity-metric function can be now simply constructed as F bm F id F bm , and it is respectful as the chaining of respectful functions is (Theorem 8).By Theorem 9, we can also derive the respectfulness of the up-to-triangle-inequality function (corresponding to the up-to-transitive-closure technique on relations), defined as { n F id } n≥1 , where n F id is the chaining of F id with itself n-times.Another useful proof technique consists in the possibility of splitting probability distributions into components with common factors, and then only consider the (possibly weighted) distances between the components.Define the up-to-quasiconvexity and the up-to-convexity functions as follows: The respectfulness of the above up-to techniques depends on the (quasi)convexity of the Kantorovich operator.The following result is derived using the identity G id as a source map.Theorem 11.If K is quasiconvex (resp.convex) then F qcv (resp.F cv ) is respectful.

Faithful contexts
With up-to context techniques, common contexts in the probability distributions reached in the bisimulation game are allowed to be safely removed.Given a set of states S, a context is a function C : S → S. As usual, we write C[s] to denote the image of s under C. We look at states in S as defined by a language whose terms are syntactically finite expressions, which justifies the following assumption: for any class C of contexts, there is only a finite number of states s such that s = C[s ] for some C ∈ C.
Since the Lipschitz property is preserved by (Thm 1), it is easy to show that C The function C(m) (respectively: C * (l)) can be alternatively characterized by considering the infimum value of m when a common context is removed from two terms (respectively: from two distributions).The context closure (s, t) C of the pair (s, t) is the set of all pairs of terms of the form (C[s], C[t]), for C ∈ C. The context closure (∆, Θ) C * is extended to probability distributions using the set of contexts C * ∈ C * .about a certain disease, and that for this purpose we are allowed to ask queries like "how many patients are affected by the disease".Queries of this kind are called counting queries and it is well known that they can be sanitized, i.e. made ε-differentially private, by adding geometric noise to the real answer, namely a noise distribution p y (z) = c z e |z−y|ε , where y is the real answer, z is the reported answer (ranging between 0 and n), and c z is a normalization constant that depends only on z.Another database D is adjacent to D if it differs from D for only one record (i.e., one patient).Clearly, the (sanitized) answers to the above query in two adjacent databases will differ by at most 1, and it is easy to see that the ratio between p y+1 (z) and p y (z) is at most e ε , which proves that ε-differential privacy is satisfied by the geometrical-noise method.
Example 18.Consider the adjacent databases D, D where y and y +1 patients are affected by the disease, respectively.We model D and D in pCCS as where the prefix q represents the acceptance of a query request, and the action vz represents the delivery of the reported answer.Consider now a process Q that queries the database.This can be defined as Q = q.+ n z=0 v z .wz , where + n z=0 P z denotes the nondeterministic choice P 0 + P 2 + ... + P n .It is possible to prove that the processes What we want to prove now is that the level of differential privacy decreases linearly with the number of queries (this is a well-known fact, the interest here is to show it using up-to techniques).Namely that if we define the processes P and P as the parallel composition of i instances of Q and D and D respectively, then K ⊗ (P, P ) ≤ iε We prove this for the case i In this paper we studied techniques to increase the efficiency of the bisimulation proof method in the case of the (extended) Kantorovich metric.To this purpose, we have explored properties of the Kantorovich lifting, and we have generalized to the case of metrics the bisimulation up to F method by Sangiorgi.This allows us to reduce the size of the set of pairs for which we have to show the progress relation.The theory of compatibility [11] for up-to techniques generalizes the respectfulness conditions on relations in a lattice-theoretic setting, where general properties of the progress relation and of the up-to functions (seen as functionals on the same lattice) can be proved and later instantiated to capture bisimulation relations on automata.A more recent approach C O N C U R 2 0 1 6

35:14
Up-To Techniques for Generalized Bisimulation Metrics [10] consists in directly focusing on the greatest compatible (or respectful) function.In this paper we considered probabilistic systems and metrics, where the domain and the target of the progress relation are not in the same lattice anymore, and the up-to functions are defined on the target domain.The generalization of the techniques presented in this paper to a lattice-theoretic setting provides an interesting line of research.In [2], up-to techniques are developed in an abstract fibrational setting, from which one could be able to obtain techniques for metrics.Studying whether the techniques of this paper can be obtained in this way is left as future work.

Definition 2 .
Given m ∈ M(S), l ∈ M(P(S)) we say that m progresses to l, written m l, iff m(s, t) < implies that: whenevers α −→ ∆ then t α −→ Θ with l(∆, Θ) ≤ m(s, t) with l(∆, Θ) ≤ m(s, t)Using the Hausdorff metric, progression can be written as a Lipschitz property:4 m l iff ∀α : → α is m, H(l)-LipFrom the results about operations preserving Lipschitz, and the fact that Hausdorff is monotone, we obtain the following useful properties of the progress relation: m l implies m l for all m ≥ m, l ≤ l.Let d ∈ M d (P(S)).Then m d implies m d.Let m = i m i and l = l i such that for all i: m i l i .Then m l.

Theorem 9 .
The infimum of a set of respectful functions is respectful.

Definition 12 .
Given a class of contexts C, a premetric m is closed under C iff C is m, m-Lip for all C ∈ C. The closure of m under C, denoted by C(m), is defined as the greatest premetric below m that is closed under C: