DiRPOMS: Automatic Checker of Distributed Realizability of POMSets

. DiRPOMS permits to verify if the speciﬁcation of a distributed system can be faithfully realised via distributed agents that communicate using asynchronous message passing. A distinguishing feature of DiRPOMS is the usage of set of pomsets to specify the distributed system. This provides two beneﬁts: syntax obliviousness and efﬁciency. By deﬁning the semantics of a coordination language in term of pomsets, it is possible to use DiRPOMS for several coordination models. Also, DiRPOMS can analyze pomsets extracted by system logs, when the coordination model is unknown, and therefore can support coordination mining activities. Finally, by using sets of pomsets in place of ﬂat languages, DiRPOMS can reduce exponential blows of analysis that is typical in case of multiple threads due to interleaving. 1 2


Introduction
Choreographic approaches advocate two views of the same distributed system: a global view that describes ordering conditions and constraints under which messages are exchanged, and local views that are used by each party to build their components.Here, the global view is a specification that is realised by combination of the local systems.As observed in [1], a source of problems is that there are some global specifications that are impossible to implement using distributed agents in a given communication model.
DiRPOMS is a tool designed to analyze realisability of choreographies.A choreography is formalized as a set of pomsets, were each pomset represents the causalities of events in one single branch of execution.Local views are modeled via finite state machines that communicate via asynchronous message passing.DiRPOMS checks realizability by verifying two closure conditions of the input pomsets and outputs the corresponding counterexamples:

DiRPOMS
The first use case of our tool is design time analysis, where an architect checks if a choreography is realizable.In this case, violations of the closure conditions (i.e. the counterexamples) enable to identify behaviors that are not included in the choreography but are necessary in any distributed system that implements it (using finite state machines and asynchronous message passing).The usage of set of pomsets allows this analysis to be syntax oblivious, since the semantics of several existing choreographic models (i.e. [11], [6], [8]) can be expressed using set of pomsets.
The second use case is choreography mining.In this case an analyst extracts a hypotheses choreography from (partial) execution logs of a distributed system.Here, violations of the closure conditions enable to identify behaviors of the distributed system that are not included in the logs, so supplementing partial information regarding the system under test and reducing the number of executions needed to extract a model of the system.
The paper is organized as follows.In Section 2 we present the models for local and global views and in Section 3 we briefly recall the theory supporting our tool.Section 4 presents some examples of faulty choreographies, which cannot be implemented using communicating finite state machines.Section 5 shows an example of choreography mining, where the tool is used to identify missing traces from a partial execution log.Usage, implementation, and evaluation of the tool are presented in Sections 6, 7, and 8.

Local and global views of choreographies
We assume a set P of distributed participants (ranged over by A, B, etc.) and a set M of messages (ranged over by m, x, etc.).Participants communicate by exchanging messages over channels, that are elements of the set C = (P × P ).The set of (communication) labels L, ranged over by l and l , is defined by where (outputs) L ! = C ×{!}×M and (inputs) L ? = C ×{?}×M we shorten (A, B, !, m) as A B!m and (A, B, ?, m) as A B?m.The subject of output and input are the sender (sbj A B!m = A) and receiver (sbj A B?m = B) respectively.
Local systems are modeled in terms of communicating fine state machines [1].
is a finite-state automaton on the alphabet {l ∈ L | sbj l = A} such that, q 0 ∈ Q is the initial state, and F ⊆ Q are the accepting states.A (communicating) system is a map S assigning an A-CFSM to each participant A ∈ P .A configuration of a communicating system consists of a state-map q, which maps each participant to its local state, and buffer-map b, which maps each channel and message to the number of outputs that have been consumed.A configuration is accepting if all buffers are empty and the local state of each participant is accepting while it is a deadlock if no accepting configuration is reachable from it.The initial configuration is the one where, for all A ∈ P , q(A) is the initial state of the corresponding CFSM and all buffers are empty.The semantics of communicating systems is defined in terms of a labeled transition relation between configurations.Each transition models one action performed by one machine: an output, which adds a message to a channel, or an input, which consumed a pending message from a channel.Formally q ; b l = ⇒ q ; b if there is a message m ∈ M such that either (1) or (2) below holds: where, f [x → y] represents updating of a function f in x with a value y.Definition 2. The language of a communicating system S is the set L(S) ∈ L of sequences l 0 . . .l n−1 such that exist a trace labeled with l 0 . . .l n−1 that start in the initial configuration and ends in an accepting configuration.
The notion of realisability is given in terms of the relation between the language of the global view and the one of a system of local views "implementing" it [1].

Definition 3 (Realisability).
A language L ⊆ L is weakly realisable if there is a communicating system S such that L = L(S); when S is deadlock-free we say that L is safely realisable.
We model the global views in terms of sets of pomsets, where each pomset models one branch of execution.Definition 4 (Pomsets [4]).A labelled partially-ordered set (lposet) is a triple (E, ≤, λ), with E a set of events, ≤⊆ E × E a reflexive, anti-symmetric, and transitive relation on E, and λ : E → L a labelling function mapping events in E to labels in L.
Fig. 2: A set of two pomsets that represents the global view of the system of Figure 1 Pomsets allow to represent scenarios where the same communication occurs multiple times.Intuitively, ≤ represents causality; if e < e then e is caused by e.Note that λ is not required to be injective: λ(e) = λ(e ) means that e and e model different occurrences of the same action.In the following, [E, ≤, λ] denotes the isomorphism class of (E, ≤, λ), symbols r, r , . . .(resp.R, R , . . . ) range over (resp.sets of) pomsets, and we assume that pomsets r contain at least one lposet which will possibly be referred to as (E r , ≤ r , λ r ).
The projection r A of a pomset r on a participant A ∈ P is obtained by restricting r to the events having subject A. We will represent pomsets as (a variant of) Hasse diagrams of the immediate predecessor relation.
A pomset is well-formed if (1) for every output A B!m there is at most one immediate successor input A B?m, (2) for every input A B?m there exists exactly one immediate predecessor output A B!m, (3) if an event immediately precedes an event having different subjects then these events are matching output and input respectively, (4) ordered output events with the same label cannot be matched by inputs that have opposite order.A pomset is complete if there is no output event in without a matching input event.
Definition 5. Given a pomset r = [E, ≤, λ], a linearization of r is a string in L obtained by considering a total ordering of the events E that is consistent with the partial order ≤ , and then replacing each event by its label.The language of a pomset (L(r)) the set of all linearizations of r.The language of a set of pomsets R is simply defined as The set of pomsets of Figure 2 represents the global view of the system of Figure 1, i.e. the two views have the same language.The two pomsets represents two different scenarios (i.e.branches): in the left scenario A sends x, in the right scenario A sends y.

Realisability conditions
Our tool uses the verification conditions for realisability identified in [5].These conditions requires to introduce the following definitions.
Definition 6 (Inter-Participant Closure).Let (r A ) A∈P be the tuple where r A = r A A for all A ∈ P .The inter-participant closure ((r A ) A∈P ) is the set of all well-formed pomsets The inter-participant closure takes one pomset for every participant and generates all "acceptable" matches between output and input events.We use the following tuple of pomsets (r A , r B ) to illustrate the inter-participant closure.
Pomset r A represents a fork while pomset r B represents a join.The inter-participant closure of (r A , r B ) consists of four well-formed pomsets: Definition 7 (More permissive relation).A pomset r is more permissive than pomset r, written r r , when E r = E r , λ r = λ r , and ≤ r ⊇≤ r .
The more permissive relation guarantees language inclusion, i.e. if r r then L(r) ⊆ L(r ).

Definition 8 (Prefix pomsets).
A prefix of a pomset r is a pomset on a subset of the events of r that preserves the order and labelling of r.
The realisability conditions presented in [5] are two closure conditions, which are formalized by the following theorem Theorem 1.If R satisfies CC2-POM then L(R) is weak realisable, if R also satisfies CC3-POM then its language is safe realisable, where -CC2-POM(R) for all tuples (r A ) A∈P of pomsets of R, for every pomset r ∈ ((r A A ) A∈P ), there exists r ∈ R such that r r .
-CC3-POM(R) for all tuples of pomsets (r A ) A∈P such that rA is a prefix of a pomset r A ∈ R for every A, and for every pomset r ∈ ((r A A ) A∈P ) there is a pomset r ∈ R and a prefix r of r such that r r .
Intuitively CC2-POM requires that if all the possible executions of a pomset cannot be distinguished by any of the participants of R, then those executions must be part of the language of R. Similarly, CC3-POM requires that if all partial executions cannot be distinguished by any of the participants of R, then those executions must be a prefix of the language of R.

Realisability by examples
In this section we give some examples of the problems related to implementing pomsetbased choreographers using CFSMs.Distributed choices can prevent faithful implementations in case of lack of coordination.For example, the set R 1 models two branches.Participants A and C should both send the message x or both send the message y.However, A and C do not coordinate to achieve this behaviour; this makes it impossible for them to distributively commit to a common choice.R 1 satisfies CC2-POM.However, pomset r 1 , which represents the case A and C do not agree on the message to deliver, is in the inter-participant closure of prefixes and violates CC3-POM.
Here the two branches describe different orders of the same set of events.The behaviour of A (and D) is the same in both branches: A (resp.D) concurrently sends message x (resp.y) to B and C. The behaviours of B and C differ: in the left branch they first receive the message from A then the one from D, in the right branch, they have the same interactions but in opposite order.This choreography cannot be realised since, intuitively, it requires B and C to commit on the same order of reception without communicating with each other.Pomset r 2 , which captures the case when B and C do not agree on the order of message reception, is in the inter-participant closure and violates CC2-POM.
The last example demonstrates problems led by the usage of the same message in the concurrent threads.The set R 3 : consists of a single pomset, which represents two concurrent sub-choreographies.The usage of message x in both threads can cause the following problem: (1) the left thread of A executes A C!l 1 and A B!x; (2) after the output B C!r 2 , the right thread of B executes the input A B?x, so "stealing" the message x generated by the left thread of A and meant for the left thread of B; (3) the right thread of B executes B C!r 3 .Pomset r 3 , which represents this case, is in the inter-participant closure and violates CC2-POM.
5 Identifying missing execution logs for choreography mining Choreography (and process) mining [10] of extracting a hypothesis choreography from a partial execution log of a distributed system.In this section we show that violations of the closure conditions can be used to identify behaviors of the distributed system that are not included in the log.Therefore the closure conditions can support the mining and testing activities.
Let the partial execution log of the system of Figure 1  This pomset represents the fact that there must be an execution of the system where A sends y and B receives the first message from C, i.e.: This information can be used to fix the hypothesis choreography, by enabling the traces that are necessarily part of the behaviors of the distributed system.The set of pomsets of Figure 2 satisfies both closure conditions and its language includes the initial partial execution log.

Tool usage
DiRPROM is written in Python and provides a set of API to build and manipulate pomsets and to check the closure conditions.The API can be invoked by any Python development environment (in the demo video we use org-mode [9] for analyzing the examples using literate programming).
A typical DiRPOM session starts by defining the set of pomsets modeling the choreography.Pomsets can be loaded using the existing formats (including GEXF, GraphML, and JSON), be generated by translating other choreography models, or be dynamically generated.For example, the following snippet creates R 1 as input choreography:  # a d d p a i r ( gr1 , A , B , n , m) c r e a t e s two node " out−n " and " i n−n " # l a b e l e d w i t h AB !m and AB?m, connects t h e two events and r e t u r n s # t h e p a i r ( out−n , i n−n ) abx = a d d p a i r ( gr1 , " a " , " b " , 1 , " x " ) cby = a d d p a i r ( gr1 , " c " , " b " , 2 , " x " ) abz = a d d p a i r ( gr1 , " a " , " b " , 3 , " z " ) # I n p u t pomsets do n o t need t o be t r a n s i t i v e ( t r a n s i t i v e c l o s u r e # i s done i n t e r n a l l y ) gr1 .add edge ( abx [ 1 ] , abz [ 1 ] ) gr1 .add edge ( cby [ 1 ] , abz [ 1 ] ) gr1 .add edge ( abx [ 0 ] , abz [ 0 ] ) g l o b a l v i e w .append ( gr1 ) # r i g h t pomset o f R2 gr2 = nx .DiGraph ( ) abx = a d d p a i r ( gr2 , " a " , " b " , 1 , " y " ) cby = a d d p a i r ( gr2 , " c " , " b " , 2 , " y " ) abz = a d d p a i r ( gr2 , " a " , " b " , 3 , " z " ) gr2 .add edge ( abx [ 1 ] , cby [ 1 ] ) gr2 .add edge ( cby [ 1 ] , abz [ 1 ] ) gr2 .add edge ( abx [ 0 ] , abz [ 0 ] ) g l o b a l v i e w .append ( gr2 ) The closure condition CC2-POM can be checked using The result cc2res is a map that yields for each index i of cc2c the index of global_view matching it or None if cc2c[i] is a counterexample.Similarly closure condition CC3-POM can be checked using ( cc3c , p r e f ) = c c 3 c l o s u r e ( g l o b a l v i e w ) # cc3c and p r e f i x are l i s t s cc3res = cc3pom ( cc3c , p r e f ) The list pref contains the list of prefixes of the input choreography, and the result cc3res maps each index of cc3c to an index of pref or None.The counter examples can be rendered using: DiRPOM also provides a command line utility, which uses GraphML format for input and output of pomsets.The left pomset of R 1 can be defined by the following GraphML file: <key a t t r .name= " l a b e l " a t t r .t y p e = " s t r i n g " f o r = " node " i d = " d0 " /> <graph e d g e d e f a u l t = " d i r e c t e d "> <node i d = " b−2"><data key= " d0 ">CB?x</ data></ node> <node i d = " b−3"><data key= " d0 ">AB?z</ data></ node> <node i d = " b−1"><data key= " d0 ">AB?x</ data></ node> <node i d = " a−1"><data key= " d0 ">AB !x</ data></ node> <node i d = " a−3"><data key= " d0 ">AB !z</ data></ node> <node i d = " c−2"><data key= " d0 ">CB! x</ data></ node> <edge source= " b−2" t a r g e t = " b−3" /> <edge source= " b−1" t a r g e t = " b−3" /> <edge source= " a−1" t a r g e t = " a−3" /> <edge source= " a−1" t a r g e t = " b−1" /> <edge source= " a−3" t a r g e t = " b−3" /> <edge source= " c−2" t a r g e t = " b−2" /> </ graph> </ graphml> Each GraphML must contain a key element, specifying the existence of the node attribute label of type string.Each node has a unique identifier and a data sub-element, which defines the node label.The following command executes the analysis of a choreography:

Tool implementation
DiRPROM relies on the NetworkX package for graph operations.In fact, pomsets are represented as direct labelled acyclic graphs.The tool consists of five modules: utils: provides export of pomsets to png and utilities to build pomsets pomset: provides functions to process pomsets, e.g.query lists of participants and messages, projections per participant or message, transitive closure and reduction, enumeration of prefixes, enumeration of linearizations

Tool evaluation
The main primitive of NetworkX used by the tool is subgraph_is_ismorphic, which returns true iff r 1 is (label-preserving) isomorphic to a subgraph of r 2 .If r 1 and r 2 have the same number of nodes and the predicates holds then r 2 r 1 .The complexity of finding a label-preserving graph isomorphism is in general exponential in the number of events.However, since the graphs are acyclic, the complexity can be bound to the number of concurrently-repeated actions: i.e. events that have the same label, are unordered, and have the same number of predecessor with the same label (e.g.A B!x in R 3 ).If there are no concurrently repeated actions then isomorphism of pomsetes can be checked in polynomial time with respect to the number of events.
We report the performance of our tool for the examples.The experiments have been executed on a Intel 2.2 Ghz i7 with 16 GB of RAM.The table reports the size of the closures, the number of counterexamples, and the processing time in milliseconds.In general the evaluation of closure conditions is fast for simple examples.However, the number of prefixes to check in CC3-POM can be large when participant have several concurrent threads.One of the advantages of checking CC -POM with respect to previous work [1] is that the former does not require the explicit computation of the language of the family of pomsets, which can lead to combinatorial explosion due to interleavings.In fact, in case of concurrency, the number of prefixes is usually smaller than the number of possible linearizations of a pomset.For example, the following pomset consists of two independent threads, each one consisting of n sequential and distinguished events

CC2-POM
The closure condition in [1] requires to directly compute the language of the pomset, which has 2 n words.Instead, the prefix of the pomset are (n + 1) 2 .As a further example, the set of pomsets R 3 contains one pomset and has two actions that occur in both threads: A B!x and A B?x.The inter-participant closure has exactly two pomsets: the element of R 3 itself and r 3 .The left and right subpomsets of R 3 , which represent the two threads, have 32 different linearizations, each one consisting of 8 events.Therefore the language of R 3 consists of 32 * 32 * 2 8 = 2 18 words.On the other hand, analyzing CC3-POM for R 3 requires to check 668 prefixes.

Concluding remarks
Realisability of specifications is of concern for both practical and theoretical reasons.Several works (e.g., [2,3,7]) defined constraints to guarantee soundness of the implementation of choreographies.These approaches address the problem for specific languages and use conditions that rely on the syntactical structure of the specification.DiRPOMS provides a language independent tool to check realisability of choreographies.Therefore, it can be used for several choreographic models, as long as their semantics can be expressed via set of partial orders.
There two main limitations of DiRPOMS that we plan to address.First, our tool cannot analyze recursive choreographies, since their pomset based semantics is infinite.Even if loops are bounded, naive loop unrolling can easily generate large sets of pomsets which are intractable.Secondly, CC -POM conditions are sufficient but not necessary conditions for realisability.In fact, the same set of traces can be expressed using different sets of pomsets by exploring different interleavings.We are currently investigating a notion of normal forms for families of pomsets that can be used to guarantee that our conditions are necessary.
We are also working on optimizing our tool.In particular we think that it is possible to demonstrate equivalence between CC3-POM and a different formulation, which requires to check only a subset of prefixes.For instance, in verifying CC3-POM for R 3 , the analysis of the prefix A C!l1 A C!r1 covers also the cases of the prefixes A C!l1 and A C!r1 .

Figure 1
Figure 1 presents a system with three participants: A, B, and C. Participant C always sends message x to B. Participant A sends two messages to B: the first message is x or y; the second message is always z. Participant B receives the first message from A and C in any order, then it receives the second message of A.A configuration of a communicating system consists of a state-map q, which maps each participant to its local state, and buffer-map b, which maps each channel and message to the number of outputs that have been consumed.A configuration is accepting if all buffers are empty and the local state of each participant is accepting while it is a

#
a choreography i s a l i s t o f pomsets g l o b a l v i e w = [ ] # a pomset i s a d e f i n e d u s i n g a d i r e c t e d graph # l e f t pomset o f R1 gr1 = nx .DiGraph ( )

cc2c = c c 2
c l o s u r e ( g l o b a l v i e w ) # cc2c i s t h e l i s t o f pomsets cc2res = cc2pom ( cc2c , g l o b a l v i e w ) e r r o r s = counterexamples ( cc3c , cc3res ) debug graphs ( e r r o r s , " o u t p u t−f o l d e r " ) # generates p i c t u r e s o f e r r o r s

-
inter_closure: implements inter-participant closure ccpom: generates the two closure sets and verifies the closure conditions dirpom: provides the command line utility In order to demonstrate the implementation of the analyses and the internal API, we report the implementation of CC3-POM: d e f c c 3 c l o s u r e ( graphs ) : # r e t r i e v e s t h e l i s t o f p r i n c i p a l s i n graphs p r i n c i p a l s = pomset .g e t a l l p r i n c i p a l s ( graphs ) # p r o j e c t s t h e i n p u t graphs on p r i n c i p a l s and y i e l d s a map mapping # p r i n c i p a l s t o l i s t o f " l o c a l " pomsets ( a v o i d s d u p l i c a t e s ) l o c a l t h r e a d s = pomset .g e t p r i n c i p a l t h r e a d s ( graphs , p r i n c i p a l s ) l o c a l p r e f i x e s = {} f o r p i n p r i n c i p a l s : # computes a l l p r e f i x e s o f a l l graphs i n l o c a l t h r e a d s [ p ] # ( a v o i d s d u p l i c a t e s ) l o c a l p r e f i x e s [ p ] = pomset .g e t p r e f i x e s ( l o c a l t h r e a d s [ p ] ) # generates a l l t u p l e s i n t h e p r o d u c t o f l o c a l p r e f i x e s t u p l e s = i n t e r c l o s u r e .make tuples ( l o c a l p r e f i x e s ) # computes t h e i n t e r −p a r t i c i p a n t c l o s u r e o f a l l t h e t u p l e s # ( a v o i d s d u p l i c a t e s ) i p c = i n t e r c l o s u r e .i n t e r p r o c e s s c l o s u r e ( t u p l e s ) # computes a l l p r e f i x e s o f t h e i n p u t graphs ( a v o i d s d u p l i c a t e s ) p r e f i x e s = pomset .g e t p r e f i x e s ( graphs ) r e t u r n ( i p c , p r e f i x e s ) d e f cc3pom ( i p c , p r e f i x e s ) : matches = {} f o r i i n range ( l e n ( i p c ) ) : matches [ i ] = None f o r j i n range ( l e n ( graphs ) ) : # checks i f graph [ j ] i s more p e r m i s s i v e than i p c [ i ] i f ( pomset .i s m o r e p e r m i s s i v e ( graph [ j ] , i p c [ i ] ) ) : matches [ i ] = j break r e t u r n matches i m p o r t networkx .a l g o r i t h m s .isomorphism as i s o nm = i s o .c a t e g o r i c a l n o d e m a t c h ( ' l a b e l ' , ' ' ) d e f i s m o r e p e r m i s s i v e ( g1 , g2 ) : i f l e n ( g1 .nodes ( ) ) ! = l e n ( g2 .nodes ( ) ) : r e t u r n False m = i s o .GraphMatcher ( g1 , g2 , nm) r e t u r n m. s u b g r a p h i s i s o m o r p h i c ( )