Gray-Box Conformance Testing for Symbolic Reactive State Machines

. Model-based testing (MBT) is typically a black-box testing technique. Therefore, generated test suites may leave some untested gaps in a given implementation under test (IUT). We propose an approach to use the structural and behavioural information exploited from the implementation domain to generate eﬀective and eﬃcient test suites. Our approach considers both speciﬁcation models and implementation models, and generates an enriched test model which is used to automatically generate test suites. We show that the proposed approach is sound and exhaustive and cover both the speciﬁcation and the implementation. We examine the applicability and the eﬀectiveness of our approach by applying it to a well-known example from the railway domain.


Introduction
Model-based testing (MBT) has received significant attention in testing complex software systems.The benefit of model-based testing is primarily in automated test case generation and automated analysis of the test results.In an MBT process, test cases are automatically derived from a (preferably formal) model of the specification and are executed on the implementation under test (IUT).MBT is typically a black-box testing technique, in which the implementation is only accessible through its interfaces and thus, test data is generally selected based on the specification.Therefore, generated test suites may leave some untested gaps in a given IUT and/or redundantly cover the same logical path several times.
To address this issue test models and test case generation processes can be enriched with structural or behavioural information extracted from the implementation.This is a promising approach considering the existing techniques for extracting models from implementations, in particular, recent learning-based approaches inferring models from software (e.g., [1,2]).Such models provide an abstraction of the implementation based on its observable behaviour.Using these models in testing improves the coverage of the IUT, up to the accuracy of the extracted model.
This paper proposes a gray-box testing strategy in that test suites are generated considering both the specification and an abstraction of the IUT.With such a test suite the coverage of the specification model and the implementation would be complementary to each other and hence, more faults could be uncovered.Moreover, such test suites are tailored to a given IUT and thus, a fewer number of test cases are generated -to satisfy a certain testing goal-in comparison to universal test suites that are supposed to detect faults in any possible implementation.The main contribution of this work is considering the partitioning of the input domain which can be obtained from (black-box) implementations (e.g., by model learning techniques) in generating test suites.We show that although such information may be generated for different purposes, it can be used in test generation and does improve the coverage of the generated test cases.
In this work, specifications and implementations are modelled with a specific type of transition systems, called Symbolic Reactive State Machines (SRSMs).Given the SRSMs of the specification and the IUT, a complete test suite is generated based on the, so-called, transition composition of these models.In generating test cases, the justification of the proposed data selection is demonstrated by a special case of the uniformity hypothesis [3] -the theoretical foundation for testing with a finite subset of values.
The rest of the paper is structured as follows: Section 2 provides an overview of the related work.Section 3 introduces the formalism used in this paper and Section 4 defines our notion of conformance.The proposed testing strategy is outlined in Section 5.In Section 6, we provide the experimental results of examining the effectiveness of our approach.Section 7 discusses the future work and concludes the paper.

Background and Related Work
Several black-box test case generation methods are proposed in the literature for various formalisms (e.g., finite state machines [4,5] and labeled transition systems [6]).The completeness of these methods (i.e., specifying all possible behaviour of a system) is typically explained with respect to a specified subset of possible implementations which is refered to as a fault model [7].This is because in many practical cases, it is not possible to have a complete test suite as such a test suite would be infinitely large.
Gray-box model-based testing strategies provide a combination of black-box model-based testing with white-box testing to tune fault detection with respect to a given implementation.For example, in [8], the structure of the tests is generated using MBT (from the specification model) and then a white-box testing technique is used to find a set of concrete values for parameters that maximise code coverage.The approach presented in this paper, in a similar way, considers the IUT in generating test cases.However, it differs from [8] in that both the structure and the parameters of test cases are influenced by a combination of a test model and information from the implementation.
Our proposed approach has been largely established considering the promising results from existing learning-based techniques for inferring and extracting models from implementations.Some of the techniques have focused on sequential models typically in the form of FSMs (e.g., [9,10]) and some on data-dependant behaviour in the form of pre-and post-conditions (e.g., [11]).More recently, EFSMs are considered to infer more complete models (combining control and data).For example, Cassel et al. [2] introduce an active learning algorithm to infer a class of EFSMs.Walkinshaw et al. [1] provide a model inference technique (called MINT) which infers EFSMs from software executions.We believe that the model inference techniques which, in particular, infer EFSMs can provide the required abstract model of implementations in the context of our work (i.e., an inferred model can be translated into our formalism).
There are also a number of similar models, to our formalism, in the literature of MBT such as action machines (AM) [12], symbolic transition systems (STS) [13], FSMs with symbolic inputs [14], and symbolic input output FSMs (SIOFSM) [15].SIOFSMs particularly support inputs with infinite domain.We expect that each of these underlying models (and their associated test case generation algorithms) can be adopted in our approach.
Another closely related line of work is equivalence-class-based testing.The theoretical foundation for this approach has been presented in [3] by the uniformity hypothesis, which states that it suffices to check the representatives of sub-domains in which the behaviour is the same among all elements.We discuss the justification of our strategy based on this hypothesis.Huang et al. [16] propose a complete model-based equivalence testing strategy applicable to reactive systems with large, possibly infinite input data types but finite internal and output data.Our approach is inspired by [16] and extends it by replacing the heuristics for refinement with the information extracted from the IUT.It also differs from [16] in that it allows for infinite output domains.

Motivating Example
To motivate this work, we use one of the benchmarks provided in [2], namely the prepaid card, in which the card's balance is limited to 500 SEK, and no more than 300 SEK can be topped up in a single transaction.Fig. 1a illustrates the behaviour of this card for the update balance operation.Variable a is the amount to update the balance of the card, and variable b is the current balance of the card.Labels of the form 'C/O' on transitions state that the transition is triggered by inputs satisfying C and the outputs are updated according to O.
Assume that there is an implementation of this card and we have an abstract model of it which is generated by RaLib [2], depicted in Fig. 1b.As it is observed in Fig. 1b, the learned model introduces a different partitioning of the inputs comparing to the specification's.This difference is typically observable between a learned model and the already existing (reference) models.In this work, we suggest to consider such information and we show that it will improve the coverage of the specification and the IUT in a testing experiment.Note that the abstract models extracted from implementations may not contain the exact input-output relation.They largely provide useful information about the partitioning of the input domain.Accordingly, we mainly consider and use the complementary information about the partitioning of inputs in generating tests.

Preliminaries
For formal reasoning, we need a model of a specification, and also assume that the behaviour of the IUT can be captured by some (unknown) formal model in a given formalism.In the following, we introduce the formalism used in this work to model specifications and abstractions of implementations, and then define conformance in its context.

Symbolic Reactive State Machines
A Symbolic Reactive State Machine (SRSM) is a symbolic representation of the state-based behaviour of a system, with a set of input/output variables.It is symbolic as it explicitly uses the notion of variables, rather than concrete values, in specifying transitions (e.g., data-dependent transitions) and outputs (e.g., output as a function of input variables).

Definition 1 (Symbolic Reactive State Machine (SRSM)
).An SRSM S * is a 6-tuple ( S, s0 , δ, λ, V, D), where -S is the non-empty and finite set of symbolic states, -s0 ∈ S is the initial symbolic state, -V is the set of variables such that Example.Fig. 1a shows the behaviour of our example prepaid card as SRSM 500], δs , and λs are defined based on the given transitions.(Note that the machine remains in a same state and the outputs will remain unchanged for any input not satisfying the conditions in the labels.)

Concrete and Symbolic Paths
The behaviour of an SRSM is described in terms of the outputs produced for given inputs, which is formally represented by a set of paths (i.e., sequences of transitions) in the model.In an SRSM model, there are two types of paths, namely concrete paths and symbolic paths, which are defined below.

Definition 2 (Concrete Path). In an SRSM S
The set of all concrete paths in S * is denoted by P ath(S * ) and for a set of concrete paths CP , The set of all symbolic paths in S * is denoted by SymP ath(S * ) and for a set of symbolic paths SP , In(SP ) is defined as {In(sp) | sp ∈ SP }.
Each transition represents a set of concrete transitions and thus, a symbolic path sp specifies a set of concrete paths, called its interpretation.
Definition 4 (Symbolic Path Interpretation).In an SRSM S * , the interpretation of a symbolic path sp = s0 (C 1 , s1 ) . . .(C n , sn ), denoted by sp , is the set of concrete paths defined as {cp 1 , cp 2 , . ..} such that for each cp i (i = 1, 2, . ..) where Out(sp) = ϕ 1 ϕ 2 . . .ϕ n A symbolic path can be partitioned into a set of subpaths such that these paths do not have any concrete path in common and altogether, they cover all the concrete paths in the main symbolic path.

SRSM Models and Conformance
This section defines our notion of behavioural conformance between two SRSMs.The first statement indicates that all the input sequences defined in the specification should be defined in the IUT.In particular, for non-deterministic behaviour, it indicates that the IUT should at least have one concrete path with the same inputs.Then, the second statement says that for those concrete paths whose inputs are defined in the specification, the IUT should satisfy the specification.The statement also implies that the IUT may have additional behaviour (i.e., sequences of inputs which are not defined in the specification).
The above definition of conformance implies that we need to examine each and every path in P ath(S * ) with all paths in P ath(T * ) and vice versa in order to detect a non-conformant IUT.However, this is not feasible in most practical contexts (e.g., infinite input domain or a large number of concrete paths).We address this problem by defining conformance in terms of symbolic paths.To do so, we first define two relationships, namely compatibility and containment, for comparing two symbolic paths with each other.These relations allow determining conformance by comparing symbolic paths rather than concrete paths.Subsequently, we show how checking conformance at the symbolic level can be reduced to checking conformance of a finite number of concrete paths in their interpretation.Example.Consider symbolic paths sp 1 ∈ SymP ath(S * PPC ) and sp 1 ∈ SymP ath(T * PPC ), defined as follows.sp 1 is not compatible with sp 1 as In(sp 1 ) In(sp 1 ) and therefore sp 1 ≺ sp 1 .
Herein, the main issue is to find out whether two expressions are equivalent.It is not always possible to evaluate and compare two expressions for all the input values, for example when inputs are infinite.To overcome this issue, we introduce and define n-uniformity between two functions (expressions), which is defined w.r.t. the set of inputs on which they are both defined.
Definition 10 (n-Uniformity).Let f : D f → D O and g : D g → D O be two functions where D f , D g ∈ P(D I ).Then, f and g are n-uniform over D f ∩ D g , denoted by f ≈ n g, if and only if n is the smallest number for which the following statement holds.
Accordingly, if the degree of uniformity between output functions in two symbolic paths is determined, it is possible to find out if those paths are compatible or not and this could be done with a finite number of values.This is explained by the following lemma.
Using the above lemma, for any pair of symbolic paths sp and sp , we can find the minimum number of distinct sequences of inputs required to determine if sp sp or not.This number, denoted by DistDeg(sp, sp ), can be calculated regarding the n-uniformity between the output functions associated to these paths.
Example.Consider symbolic paths sp ∈ SymP ath(S * PPC ) and sp ∈ SymP ath(T * PPC ).In(sp ) In(sp) and hence sp ≺ sp.The output functions in these models (ϕ and ϕ ) are polynomials of degree one, therefore ϕ ≈ 1 ϕ and DistDeg(sp, sp ) = 2: we can determine if sp sp with two sequences of inputs.
sp = s0 ({a ≤ 300}, s1 )({a ≤ 300}, s1 ); Although n-uniformity is an abstract concept, as the above example suggests, in many practical cases, it can be determined by statically analysing the model/program expressions.

Conformance Testing for SRSMs
This section formalises conformance testing in the context of this work and the introduced formal model.

Test case and Test Suite
A test case, defined below, specifies a sequence of inputs and their corresponding expected set of outputs according to the specification.

Definition 11 (Test Case and Test Suite).
1.A test case tc is a tuple (in seq , out seq ), where in seq is a finite sequence of inputs c 1 c 2 . . .c k such that c i ∈ D I for 1 ≤ i ≤ k, and out seq is a set of finite sequences of outputs {O 1 , O 2 , . . ., O n } where A test suite is a finite set of test cases.
In the context of this work, test cases are executed to a system, one by one: the inputs are given to the system and the outputs are observed.The comparison of the observed behaviour with the expected behaviour determines the test verdict (pass/fail).
Definition 12 (Test Case Execution).Execution of a test case tc on an SRSM S * , denoted by Exec(tc, S * ), gives the sequence of outputs specified by the concrete path cp ∈ P ath(S * ) such that In(cp) = In(tc) and then, Exec(tc, S * ) = Out(cp).If there is no such concrete path the test case is not applicable on the model which is denoted by Exec(tc, S * ) = ⊥.

Complete Test Suite
An ideal test suite should specify all possible behaviours of a system and its specification.Such a test suite is called complete.However, this is not possible in most practical cases.A common and typical approach to address this issue is to restrict the power of a test suite to only detecting conformance or only detecting non-conformance (i.e., soundness and exhaustiveness in [6]).We define completeness in the context of our proposal in that we generate a test suite specifically enriched for testing a particular implementation such that 1. there would be no uncovered symbolic behaviour in any of the models (coverage), 2. none of the test cases fails, if the implementation conforms to the specification (soundness), and 3. for any non-conformant behaviour in the implementation, there is a specific test case which discovers that behaviour (relative exhaustiveness).
Accordingly, a complete test suite is the one that satisfies test coverage, soundness, and relative exhaustiveness.In the next section, our proposed testing strategy to generate a complete test suite is presented.

Gray-Box Conformance Testing
In this section, we define the transition composition of two SRSM models which provides an integrated view of the transitions of both models in one model, regardless of their outputs.We then use this model to generate the target test suite.

Transition Composition
Intuitively, the transition composition is a (sub-)product of the models in that the transition function is defined based on the intersection of transitions.
Definition 17 (Transition Composition).Let S * = ( S, s0 , δs , λs , V, D) and T * = ( T , t0 , δt , λt , V, D) be two SRSMs with the same I/O variables.M * = ( M , m0 , δ, ∅, V, D) is the transition composition of S * and T * , denoted by M * = trComp(S * , T * ), where In a transition composition, the outgoing transitions on each state are defined based on the intersection of the valid input domains of the transitions of the components.The specific symbols err s and err t identify situations in which there is a set of inputs defined in one model but not in the other.Note that we keep tracking states involving err s and err t as we do not want to lose any possible transition in any of the models.
Corollary 2. Let S * = ( S, s0 , δs , λs , V, D), T * = ( T , t0 , δt , λt , V, D), M * = ( M , m0 , δ, ∅, V, D), and M * = crComp(S * , T * ).Then for all m, m ∈ M and C ∈ P(D I ) such that δ( m, C) = m the following two statements hold Example.Fig. 2 shows a part of the transition composition of the models in Fig. 1a and Fig. 1b.The transition composition of two SRSM models has two main properties which allow generating a complete test suite.First, according to Definition 18, it covers both of its underlying models (Theorem 1).Second, all the symbolic paths in the transition composition is at least compatible with a symbolic path in one of the underlying models indicating that the transition composition does not have any extra behaviour (Theorem 2).

Test Suite Generation
Having defined the transition composition of two SRSMs, we next generate a complete test suite.First, we define the test cases for each symbolic path in the transition composition, which are then accumulated in the final and complete test suite.T C(sp) The following theorem demonstrates that a composition-based test suite satisfies test coverage, soundness and exhaustiveness properties.

Experimental Results
In order to check the effectiveness of our approach, we use our method in the context of a well-known example from the European Train Control System (ETCS), namely the Ceiling Speed Monitor (CSM) module which monitors the speed of a train and triggers the required actions if the maximal speed is exceeded.A complete description of the system can be found in [19].We applied our method in testing six different (faulty) implementations of the CSM module and compared the outcomes with random testing and the equivalence class testing introduced in [16].Implementations are mutants of a correct implementation of the CSM module.In the first implementation (IUT 1 ) the faults are related to boundary values (e.g., < replaced by ≤).In the next four implementations (IUT 2 , IUT 3 , IUT 4 , and IUT 5 ), the faults are in the guard condition, but they are not related to boundary values.Moreover, in IUT 4 and IUT 5 , the difference between the sets of inputs defined by the correct condition and the wrong condition is too narrow (i.e., for limited number of input values the difference could be discovered).The last implementation (IUT 6 ) contains a fault in an output function associated to one of the transitions.
In the experiment, we mainly investigated the question whether our method observed the faults or not.We also considered the number of test cases generated by each method.Additionally, in order to have an approximation of the overhead associated with our method, we considered the time required to generate the transition composition.This time is computed based on the number of basic computation steps in generating the composition (assuming that all steps consume a constant amount of time, this time is proportional to the number of steps).
In random testing, test cases are created by generating random values in the appropriate data ranges.For equivalence class testing, we considered a refinement of the initial coarsest input equivalence class partitioning (IECP) that reflects all case distinctions visible in guard conditions of the CSM model, which implies the fault model for this testing method.Note that the number of test cases generated by IECP is the same for all the six cases.We used the test data provided in [20], for the number of generated test cases by IECP.For random testing, in each case, a random test suite of the same length as our method's, was selected and used for comparison.
Table 1 summarises the results of this experiment.Basically, the results show that our method performs better than random testing with the same number of test cases.They also show that in cases the behaviour of the IUT lies outside the fault domain of the IECP testing, in particular when the input equivalence classes are narrow, our approach performs better than IECP.This is because, in such cases, the desired input values have very low probabilities to be chosen.Therefore, in both random testing and IECP, an increase in the number of test cases has limited effect on their testing strength.The IECP testing could not kill IUT 4 and IUT 5 which are outside its fault domain and have narrow equivalence classes.IUT 2 and IUT 3 are both out of the fault domain and the set of inputs to discover their faults is not narrow (i.e., a proper input values could be chosen by random input selection).However, only IUT 3 was killed by IECP.Finally, the time required to generate the transition composition and the number of test cases could be an indication of the efficiency of our method.
Nevertheless, this experiment provides a preliminary result.In particular, having treated only one type of case study is a threat to the validity of our results.To remedy this, we plan to carry out more testing experiments consid-ering different kinds of cases.To address the efficiency and scalability question more thoroughly, in addition to more case studies, we need to collect additional information from other methods to have a valid comparison between methods, such as the time required to transform the original test model into the desired formalism.7 Conclusions and Future Work In this paper, we presented a gray-box model-based testing strategy in that test suites are generated considering both the specification and an abstraction of the IUT.Specifications and implementations abstraction are modelled as Symbolic Reactive State Machines (SRSMs), which are finite state machines with symbolic input and output.Given the SRSMs of a specification and an IUT, test cases are generated based on the transition composition of these models.We considered models with infinite input domain and then introduced the notion of n-uniformity which allows us confining the number of test cases for each symbolic path.We studied and proved coverage, soundness, and relative exhaustiveness of the proposed approach.
As for future work, we plan to roll out more testing experiments to investigate the applicability of the proposed strategy (in particular, the notion of n-uniformity) in different situations and discover its limitations.Moreover, we plan to study models with infinite set of symbolic paths and, then, how to select a finite subset of paths sufficient to generate a complete test suite, according to the regularity hypothesis [3].Finally, we would like to work on efficient algorithms for generating the transition composition (e.g., adapting bi-simulation algorithms) and also for determining n-uniformity.

Fig. 1 .
Fig. 1.The behaviour of the example prepaid card.
is partitioned into disjoint sets I and O of input and output variables, respectively, -D is the range of all variable valuations, • D I : domain of input variables • D O : domain of output variables δ : S × P(D I ) → S is the transition function, and λ : S × P(D I ) → Ē(I) is the output function.• E(I) is the set of expressions over input variables (I).• Ē(I) ∈ E(I) × . . .× E(I) |O| , i.e., each expression gives the value of one output variable.Notations.Input variables are enumerated as I = {x 1 , . . ., x k } and D I = D x1 × . ..×D x k is the domain of inputs.P(D I ) is the powerset (the set of all subsets) of D I , and x = (x 1 , . . ., x k ) is the input variable vector.We use small letters (e.g., c) to represent a single valuation of the input vector (x = c ∈ D I ) and capital letters (e.g., C) to show a set of valuations of the input vector (C ∈ P(D I )).Symbolic states are labelled with overscored letters (e.g., s, S).The Greek letter ϕ is used to represent output functions and it is a vector of expressions (i.e., ϕ ∈ Ē(I)).Given a vector of expressions ϕ and an input c ∈ D I , ϕ[c] denotes the output vector with the valuation of each expression for input c: ϕ[c] = o ∈ D O .

Definition 7 (Definition 8 .
Symbolic Path Compatibility).A symbolic path sp is compatible with a symbolic path sp , denoted by sp ≺ sp , if and only if In(sp) In(sp ), where for In(sp) = C 1 C 2 . . .C n and In(sp ) = C 1 C 2 . . .C n , In(sp) In(sp ) holds if and only if C i ⊆ C i for 1 ≤ i ≤ n.Two expressions ϕ and ϕ are equivalent over a set of inputs X ∈ P(D I ), denoted by ϕ X ≡ ϕ , if and only if ∀x ∈ X • ϕ[x] = ϕ [x].If X = D I , then ϕ and ϕ are equivalent which is denoted by ϕ ≡ ϕ .

Fig. 2 .
Fig. 2.An excerpt of the transition composition of S * PPC and T * PPC .

Definition 20 (
Composition-based Test Suite).Given the specification model S * , the implementation model T * , and their transition composition M * , a composition-based test suite, denoted by C omp T S(S * , T * ), is defined as follows.C omp T S(S * , T * ) = sp∈SymP ath(M * )

Theorem 3 .
Let S * be the specification model, T * be the implementation model, and M * = trComp(S * , T * ).Then, C omp T S(S * , T * ) is a sound and exhaustive test suite and covers S * and T * .
Definition 9 (Symbolic Path Containment).A symbolic path sp is contained in a symbolic path sp , denoted by sp sp , if and only if sp ≺ sp ∧ Out(sp) ≡ Out(sp ), where for Out(sp) = ϕ 1 ϕ 2 . . .ϕ n and Out(sp ) = ϕ 1 ϕ 2 . . .ϕ n , Out(sp) ≡ Out(sp) holds if and only if ϕ i . An SRSM S * passes a test case tc, denoted by P ass(S * , tc), if and only if it is applicable on S * and Exec(tc, S * ) ∈ Out(tc).If S * does not pass a test case tc, it fails, denoted by F ail(S * , tc). 2.An SRSM S * passes a test suite T S, denoted by P ass(S * , T S), if and only if ∀tc ∈ T S • P ass(S * , tc).If S * does not pass a test suite T S, it fails, denoted by F ail(S * , T S).
Definition 19.Let S * be the specification model, T * be the implementation model, and M * = trComp(S * , T * ) be the transition composition.For each sp ∈ SymP ath(M * ), T C(sp) is a set of test cases to examine the compatibility between the two symbolic paths in T * and S * in which sp is contained, and defined as follows.1.If there exists sp ∈ SymP ath(S * ) and sp ∈ SymP ath(T * ) such that sp ≺ sp and sp ≺ sp , then T C(sp) is a set of test cases {tc 1 , . . ., tc k }, where k = DistDeg(sp , sp ), such that In(tc i ) ⊆ In( sp ) and Out(tc i ) is determined the output(s) produced by S * for In(tc i ), 1 ≤ i ≤ k. 2. If there exists sp ∈ SymP ath(S * ) such that sp ≺ sp and there is no sp ∈ SymP ath(T * ) such that sp ≺ sp , then T C(sp) contains only one test case tc such that In(tc) ⊆ In( sp ) and Out(tc) is the output(s) produced by S * for In(tc).3. If there exists sp ∈ SymP ath(T * ) such that sp ≺ sp and there is no sp ∈ SymP ath(S * ) such that sp ≺ sp , then T C(sp) contains only one test case tc such that In(tc) ⊆ In( sp ) and Out(tc) = ⊥ (i.e., undefined).Note that such a test case observes the behaviours not specified in the specification.

Table 1 .
Experimental results