Test Generation by Constraint Solving and FSM Mutant Killing

. The problem of fault model-based test generation from formal models, in this case Finite State Machines, is addressed. We consider a general fault model which is a tuple of a specification, conformance relation and fault domain. The specification is a deterministic FSM which can be partially specified and not reduced. The conformance relation is quasi-equivalence, as all implementations in the fault domain are assumed to be completely specified FSMs. The fault domain is a set of all possible deterministic submachines of a given nondeterministic FSM, called a mutation machine. The mutation machine contains a specification machine and extends it with mutated transitions modelling potential faults. An approach for deriving a test suite which is complete (sound and exhaustive) for the given fault model is elaborated. It is based on our previously proposed method for analyzing the test completeness by logical encoding and SMT-solving. The preliminary experiments performed on an industrial controller indicate that the approach scales sufficiently well.


Introduction
Fault model-based testing receives constantly growing interests of both researchers and test practitioners. Fault models are defined in the literature in a variety of ways [16]. The work [10] proposes to define a fault model as a tuple of a specification, conformance relation and fault domain. In the context of testing from finite state machines, the specification is a certain type of FSM. A conformance relation is specific to the FSM type and for completely specified deterministic machines it is equivalence, while for partially specified machines it is quasi-equivalence. The fault domain is a set of implementation machines, aka mutants, each of which models some faults, such as output, transfer and transition faults.
In the traditional checking experiment theory the fault domain is the universe of all machines with a given number of states and input and output alphabets of the specification, see, e.g., [8,11,12,7,13]. Checking experiments are in fact sound and exhaustive, i.e., complete tests. However, their size for realistic specifications is often considered too big for practical applications. To us, this is a price to pay for considering the universe of all FSMs. Intuitively, choosing a reasonable subset of this fault domain might be the way to mitigate the test explosion effect. As an example, if one considers the fault domain of mutants that model output faults, a test complete for this fault model is simply a transition tour. The fault domains intermediate to these two domains have not yet received in our opinion sufficient attention.
To define a fault domain which is a subset of the universe of all FSMs, one could explicitly enumerate mutants as in program or model-based mutation testing, see, e.g., [1,2,3,21] or avoid this enumeration by defining a fault domain as a set of all possible submachines of a given nondeterministic FSM, called a mutation machine [4,9,6]. The mutation machine contains as a submachine a specification machine, additional transitions model potential faults. Several methods were developed for test generation using this fault model [4,9,6,22]. All these methods are adaptations of classical checking experiments for a fault domain defined by a mutation machine. A checking experiment is in fact a complete test suite, however, the use of the state identification approach imposes limitations on the fault model. First, the specification machine must be completely specified and reduced, so that state identifiers exist. Second, the mutation machine was defined only for such specification machines. The existing methods are not applicable for partial specification machines and mutation machines derived from them. Finally, the state identification approach does not support iterative test generation with a mutation machine allowing the tester to terminate the process when a complete test suite for the given fault model is not yet obtained, but facing the scalability problems he is forced to make a compromise between fault coverage and test length.
Addressing the above limitations, in our recent work [20], we have developed a method for analyzing the test completeness for a fault model using a mutation machine. The analysis approach is based on logical encoding and SMT-solving, it avoids enumeration of mutants while still offering a possibility to estimate the test adequacy (mutation score). This method paves a road to a test generation approach which uses the results of the analysis to find tests which kill mutants survived a current test suite and iterates until a test suite complete for a given fault model with a mutation machine is obtained or the tester decides to terminate it earlier. Elaboration of the iterative test generation approach which is based on the test completeness analysis and does not require the specification machine to be complete and reduced is the main goal of this paper.
The remaining of this paper is organized as follows. Section 2 defines a specification model as well as a fault model. In Section 3, we develop an approach for complete test suite generation for a given fault model with a mutation machine. Section 4 reports some results of experimental evaluation of the approach. Section 5 summarizes our contributions and indicates future work. A trace of M in state s is a string of input-output pairs which label an execution from s. Let TrM(s) denote the set of all traces of M in state s and TrM denote the set of traces of M in the initial state. Given sequence   (IO)*, the input (output) projection of , denoted I (O), is a sequence obtained from  by erasing symbols in O (I). Given a trace  in state s the input projection I is an input sequence defined in state s. We use M(s) to denote the set of all the input sequences defined in state s and M to denote the set of all the input sequences defined in state s0. Clearly, if M is complete then M = I*.
We say that an input sequence triggers an execution of M (in state s) if it is the input projection of a trace of the execution of M (in state s).
Given input sequence   M, let outM(s, ) denote the set of all output sequences which can be produced by M in response to  at state s, that is outM(s, ) = {O |   TrM(s) and I = }.
We define several relations between states in terms of traces. Given states s1, s2 of if any pair of its states are distinguishable, i.e., for every s1, s2  S there exists   M(s1)  M(s2) such that outM(s1, ) ≠ outM(s2, ),  is called a distinguishing sequence for states s1 and s2, this is denoted s1 ≄ s2 or simply s1 ≄ s2.
We also use relations between machines. Given FSMs M = (S, s0, I, O, T) and N = (P, In this paper, we assume that a specification machine is a DFSM which could be complete or partial, but all the implementation machines are complete DFSMs. This implies that we should use the quasi-equivalence relation [17]

Fault Model
We define the so-called mutation machine for a given specification machine by generalizing the definition previously given only a complete specification FSM [4,9,6,20,22] to allow the latter to be partially specified.  The specification machine A in Fig. 1 is a partial DFSM, where input b is not speci- The machine is not reduced, since state 3 is quasiequivalent to state 2. All the existing methods for test generation using mutation machines [4,9,6,22] cannot be applied for such a machine, as they are based on the assumption that the specification machine is a complete and reduced machine, as required by the state identification approach.
The mutation machine M in Fig. 1 has three mutated transitions, one representing an output fault and the other two transfer faults. It also has 14 suspicious transitions, eight of them are don't care transitions.
The mutation machine M represents mutants as its deterministic submachines. Their number is given by the following formula: In our running example, the number of mutants is 8  2  2  2 = 64. In the extreme case, considered in classical checking experiments a fault domain is the universe of all machines with at most n states, the number of states in the specification machine, and the alphabets of it. The corresponding mutation machine becomes in this case a chaos machine with all possible transitions between each pair of states. We use Chaos(A, n) to denote such a mutation machine for A. The number of FSMs it represents is the product of the numbers of states and outputs to the power of the product of the numbers of states and inputs.

3
Mutation testing In the case where M = Chaos(A, n) a complete test suite is called n-complete. This notion coincides with the classical notion of checking experiments for the fault domain consisting of FSMs with at most n states [5,7,10].
In the domain of program mutation testing, such a test suite is often called adequate for a program relative to a finite collection of programs (in our case the set Sub(M)), see, e.g., [3].
For deterministic FSMs tests that kill a given mutant FSM can be obtained from the product of the two machines, see, e.g., [2,1,17]. This approach can also be used to check whether a given test kills mutants, but it requires mutant enumeration.
In this work, we develop an approach for complete test suite generation for the fault model A, ≃, Sub(M), where A can be a partial or complete FSM not necessary reduced. It is based on mutant killing, but does not check mutants one by one, thus avoiding their full enumeration.

Distinguishing automaton
Tests detecting mutants of the specification can be determined using a product of the specification and mutation machines obtained by composing their transitions as follows.
Definition 2 [20].  Notice that for any nonconforming mutant there exists an input sequence of length at most n 2 , where n is the number of states of the specification machine, since a distinguishing automaton has no more than n 2 states.
An input sequence  LD triggers an execution in the distinguishing automaton D which is defined by an execution in the specification machine A and some execution in the mutation machine M triggered by . The latter to represent a mutant must be deterministic. Such a deterministic execution of the mutation machine M defining an execution of the distinguishing automaton D to the sink state is called -revealing. An input sequence triggering revealing executions enjoys a nice property of being able to detect mutants. Moreover all its extensions also detect at least the same mutants.
Theorem 2 [20]. Given an input sequence   A such that  LD, an -revealing execution includes at least one mutated transition, moreover, each mutant which has this execution is detected by the input sequence .
Given an input sequence  LD, the question arises how all the mutants (un)detected by this input sequence can be characterized. We address this question in the next section.

Characterisation of mutants (un)detected by an input sequence
Consider an input sequence   A whose prefixes trigger -revealing executions. These executions characterize mutants detected by , since each of them defines a distinct set of suspicious transitions involved in the execution. Based on these sets we can build a constraint on transition sets of mutants undetected by . This can be achieved by using a distinguishing automaton constrained to a given input sequence. Let Pref() be the set of all prefixes of . We define a linear automaton (Pref(), , I, D), such that each prefix of  is a state, and (, Definition 3 [20]. Given a specification machine A = (S, s0, I, O, N For  = bababa in our running example, the -distinguishing automaton for A and M is shown in Fig. 3.   Fig. 3. The -distinguishing automaton D for the specification A machine and mutation machine M in Fig. 1, where = bababa.
There are eleven executions of the mutation machine listed below which are defined by five executions of the -distinguishing automaton reaching the sink state in Fig. 3. Suspicious transitions are in bold font and the others are trusted transitions. Transitions of the specification are underlined. Three executions, namely executions 9, 10, and 11, are non-deterministic. The first eight executions belong to mutants detected by bababa.
Given a pair (s, x) S  I such that Tsx is a suspicious set of transitions, we introduce an auxiliary variable zsx which takes values from the indexes of the transitions of the mutation machine in Tsx.
Each revealing execution e of the mutation machine involving the set of suspicious transitions {t1, t2, …, tn} yields a clause ce = ((zs 1 x 1  t1)  (zs 2 x 2  t2)  …  (zs n x n  tn)) where si and xi are the source state and the input of transition ti for 1 ≤ i ≤ n. The clause ce is satisfied whenever zs i x i is not ti for some 1 ≤ i ≤ n. A solution of ce excludes at least one transition in e.
Clearly, the constraint always has a solution where values of variables determine all the unaltered transitions, but to find nonconforming mutants we need a solution if it exists which has at least one mutated transition. To this end, we add the constraint ((z3a  12)  (z3b  15)  (z4a  17)) excluding the solution defining the specification machine augmented with an arbitrary don't care transition, called a completed specification machine.
The constraint C(characterizing the mutants undetected by an input sequence is the conjunction of a clause excluding all completed specification machines and the clauses generated for every revealing execution. Any existing constraint solver, e.g., Z3 [15], could be used for satisfiability checking. A solution of C() if it exists is an assignment of the auxiliary variables. The mutant defined by such a solution includes the transitions specified by the solution along with all the trusted transitions of the mutation machine. Any nonconforming mutant detected by cannot be defined by any solution of C(), only conforming mutants can.
Algorithm 1 presents a procedure that builds a constraint CTS for a given test suite TS out of the constraints for each test in the test suite. In the running example, to solve the constraint formula C(bababa), we use the SMT solver Z3 [15] which finds the solution z2b = 11, z3a = 13, z3b = 15, z4a = 17. The solution defines a mutant with all trusted transitions, one don't care transition (2, b, 1, 4)11 and one mutated transition (3, a, 1, 3)13. The mutant is presented in Fig. 4. The mutant is nonconforming, which can be verified with the help of a distinguishing automaton obtained for the specification machine and the mutant also shown in Fig.4.
Notice that the solver could find another solution of the C(bababa), namely, z2b = 11, z3a = 12, z3b = 15, z4a = 17 which defines a conforming mutant. Given an initial test suite TSinit, the question arises how to augment TSinit with new input sequences to detect all nonconforming mutants. We elaborate a complete test suite generation procedure in the next section.

Complete Test Suite Generation
We are given a test suite TSinit  A and a fault model A, ≃, Sub(M). We want to add test cases to TSinit to obtain a complete test suite. Constraints defined in the previous section can be used to analyse the completeness of a test suite, elaborated in our previous work [20]. If the constraint for a test suite has no solution or the first solution computed by a solver defines a nonconforming mutant, we can immediately assert the incompleteness of the test suite. If however the solution defines a conforming mutant, the search continues such that the mutant will not be found again in a new round of satisfiability checking. This process iterates until no new solution is found or the generated solution defines a nonconforming mutant. A witness nonconforming mutant can be used to determine a test case which detects the mutant. A new test can kill other nonconforming mutants. Hence the constraints generated by this test should be added to the ones of the current test suite. The search terminates when the current constraint is unsatisfiable which indicates that the test suite is complete. The test generation procedure is formalized in Algorithm 2. The algorithm has two loops. The coverage analysis loop includes statements in lines 7 to 9 and the test generation loop includes statements at lines 4 to 13. The former loop is nested in the latter. When the current test suite is not complete, the execution of the test generation loop augments (in line 4) it with a new test case and updates the constraint of the current test suite with that of the generated test case. Then the coverage analysis loop is executed checking the completeness of the updated test suite, searching for a new nonconforming mutant. To this end, constraints excluding conforming mutants defined by the found solutions are iteratively added to the current constraint. The procedure terminates when the resulting constraint is unsatisfiable, indicating that the  Proof. The procedure TestSuiteGen terminates as the test generation loop terminates. It does so because a mutation machine has a finite number of submachines and the solution defining a particular mutant is generated at most once. According to Theorem 2, the computed tests are revealing input sequences. On termination of the procedure the final constraint characterizing undetected mutants excludes all conforming mutants and it is unsatisfiable. Based on Theorem 3 we have that the test suite returned by the procedure is complete.
To generate a complete test suite for the running example we consider the initial test suite TSinit = {bababa}, used in Section 3. The conjunction of C with the eight constraints excluding eight conforming mutants is unsatisfiable. The procedure returns a complete test suite TS = {bababa, baa, ba-baaba}. Notice that it generated only ten out of total 64 mutants.
The procedure TestSuiteGen also generates a complete test suite starting when the initial test suite contains just an empty input sequence.
In the next section we present experimental results obtained with a prototype tool.

Experimental results
We have developed a prototype tool implementing the proposed method for complete test suite generation. In this section we present the tool and some experimental results using it.

Prototype tool
The prototype tool is composed of four modules: an I/O module, a completeness checking module, a test generation module and a module for solver execution. The I/O module converts input data into an internal representation for processing and obtained results into a human-readable format. To this end, it implements an ANTLR-based parser [19] to interpret the mutation machine specified in a text format; it also parses the output of SMT solver Z3 [15] to extract a solution and builds a mutant. The completeness checking module builds -distinguishing automata, determines revealing executions of the mutation machine and generates constraints for the solver. The test generation module iteratively calls the former module. The prototype can also be used with other SMT solvers compatible with the SMT-LIB 2.0.
For the experiments we use a desktop computer with the following settings: 3.4 Ghz Intel Core i7-3770 CPU, 16.0 GB of RAM, Z3 4.3.2, and ANTLR 4.5.1.

Test Generation for an Automotive Controller
We consider as a case study an automotive controller of the air quality system (HVAC), which we also used in our previous work [18,20]. The functionality of the controller is to set an air source position depending on its current state and input from the environment.
The controller initially specified as a hierarchical Simulink Stateflow model is converted into an FSM with 14 states, 24 inputs and 24 x 14 = 336 transitions.
Several mutation machines were used in the experiments. The first one Mhvac was obtained by adding 46 mutated transitions to the specification machine (details are available in [20]). The formula in Section 2 gives the number of mutants 3 12 × 2 17 = 69,657,034,752.
The other mutation machines were built by adding more mutated transitions to Mhvac.
In particular, 20, 100, 428, 764 and 1000 mutated transitions were randomly added, resulting in five more mutation machines, M+20, M+100, M+428, M+764 and M+1000. Tab. 1 presents the numbers of mutated transitions, mutants, generated tests and the computation time. Each generated mutant was non-conforming, so their number coincides with that of the tests, conforming mutants were never generated. The third column of Tab. 1 represents the average values for 30 mutation machines randomly generated by adding 20 mutated transitions to Mhvac. The experimental results indicate that the approach scales sufficiently well on a typical automotive controller even with the large number of mutants.

Conclusions
In this paper we focused on generation of a complete test suite detecting all nonconforming implementations in a fault domain defined by a mutation machine. A mutation machine is a nondeterministic FSM, interpreted as a compact representation of a set of deterministic implementations of a system represented by a partially or completely specified FSM. Each deterministic submachine of the mutation machine models an implementation.
We proposed a method for generating a complete test suite which avoids complete enumerations of nonconforming mutants. The method iteratively builds constraints specifying mutants undetected by an incomplete (possibly empty) test suite and uses a solution of constraints generated by a solver to determine an undetected mutant from which a new test case is selected and derives an augmented constraint for a next iteration step until the obtained constraint becomes unsatisfiable.
While it enumerates all conforming mutants, which exist mostly when the specification is partial or unreduced FSM, it does not generate all nonconforming ones. The experimental results with a prototype tool which uses the SMT solver Z3 indicate that the number of generated nonconforming mutants reaches only a small percentage of all mutants represented by a mutation machine.
Novelty of the contributions of this paper are as follows. The proposed approach allows one to construct checking experiments for FSMs which are not necessarily complete and reduced without using any state identification facility such as characterization sets, distinguishing sequences, and state identifiers, as opposed to traditional checking experiment approaches. Thus, we demonstrate that it is possible to construct checking experiments using logical encoding and constraint solving instead of classical methods based on state identification [4,9,6,22]. Moreover, test completeness is guaranteed for a predefined subset of the universe of all FSMs with a given number of states, represented by a mutation machine. Compared to all previous work on the use of mutation machine [4,9,6,20,22], we have generalized its definition to make it applicable to partially defined specification machines. The method proposed in [22] is only applicable to mutation machines which satisfy the following assumption. If a transition of the specification machine becomes suspicious in the mutation machine then the latter has all possible (thus chaotic) suspicious transitions from the start state of the transition caused by the same input. The method also requires the specification machine be completely specified. Compared to that work, our method is applicable to arbitrary mutation machines, while the specification machine is allowed to be partially specified.
Another interesting feature of the approach is that it is iterative and allows the tester to obtain an incomplete test suite for which fault coverage can be estimated (as discussed in [20]) when facing the scalability problems he is forced to make a compromise between fault coverage and test length.
The experiments indicate that the proposed approach may scale sufficiently well, though, more experiments with industrial size specifications are needed. Our current work focuses on extending the approach to FSMs with symbolic inputs and outputs [23] and eventually to a more general type of EFSM [14].