Multiple Mutation Testing from FSM

. Fault model based testing receives constantly growing interest of both, researchers and test practitioners. A fault model is typically a tuple of a specification, fault domain, and conformance relation. In the context of testing from finite state machines, the specification is an FSM of a certain type. Con-formance relation is specific to the type of FSM and for complete deterministic machines it is equivalence relation. Fault domain is a set of implementation machines each of which models some faults, such as output, transfer or transition faults. In the traditional checking experiment theory the fault domain is the universe of all machines with a given number of states and input and output sets of the specification. Another way of defining fault domains similar to the one used in classical program mutation is to list a number of FSM mutants obtained by changing transitions of the specification. We follow in this paper the approach of defining fault domain as a set of all possible deterministic submachines of a given nondeterministic FSM, called a mutation machine, proposed in our previous work. The mutation machine contains a specification machine and extends it with a number of mutated transitions modelling potential faults. Thus, a single mutant represents multiple mutations and mutation machine represents numerous mutants. We propose a method for analyzing mutation coverage of tests which we cast as a constraint satisfaction problem. The approach is based on logical encoding and SMT-solving, it avoids enumeration of mutants while still offering a possibility to estimate the test adequacy (mutation score). The pre-liminary experiments performed on an industrial controller indicate that the approach scales sufficiently well.


Introduction
In the area of model based testing, one of the key questions concerns a termination rule for test generation procedures. It seems to us that there are two main schools of thought considering this rule. One of them follows a traditional approach of covering a specification model. In terms of the Finite State Machine (FSM) model, one could consider for coverage various features of an FSM, such as transitions or sequences of them which model test purposes often used to guide and terminate test generation. Another school focuses on fault coverage and thus follows fault model based testing, see, e.g., [26,21,22,15,16,20]. Fault model based testing receives constantly growing interests of both, researchers and test practitioners. Fault models are defined in the literature in a variety of ways [26]. In [11], we propose to define a fault model as a tuple of a specification, a fault domain, and a conformance relation. In the context of testing from finite state machines, the specification is a certain type of an FSM. A conformance relation is specific to the FSM type and for complete deterministic machines it is equivalence relation. A fault domain is a set of implementation machines, aka mutants, each of which models some faults, such as output, transfer and transition faults.
In the traditional checking experiment theory the fault domain is the universe of all machines with a given number of states and input and output alphabets of the specification, see, e.g., [23,9,12,13,8,14]. While this theory offers clear understanding what does it mean to have sound and exhaustive, i.e., complete tests, it leads to tests whose number grows in the worst case exponentially with the FSM parameters. To us, this is a price to pay for considering the universe of all FSMs. Intuitively, choosing a reasonable subset of this fault domain might be the way to mitigate the test explosion effect. As an example, if one considers the fault domain of mutants that model output faults, a test complete for this fault model is simply a transition tour. The space between these two extreme fault models has received in our opinion insufficient attention. In what follows, we present a brief account of what has been done in this respect.
In the area of program mutation testing, mutants are generated by modifying programs. The number of tests is limited by the number of mutants, which usually need to be compared one by one with the original program to determine tests that kill them [3,4]. Test minimization could then be achieved via explicit enumeration of all the mutants in the fault domain followed then by solving a set cover problem.
Mutation testing in hardware area seems to predate program mutation. An early work of Poage and McCluskey in 1964 [2] focuses on hardware faults in FSM implementations and builds a fault domain by extracting FSM mutants from modified circuits. The idea of this approach is to consolidate the comparisons of individual mutants aiming at reduction of the number of tests, however, mutants still need to be analyzed one by one. The approach in [1] focuses on detection of single FSM mutations with the same test, but provides no guarantees that mutants with multiple mutations (higher order mutants) can always be killed.
Explicit mutant enumeration can be avoided by defining a fault domain as a set of all possible submachines of a given nondeterministic FSM, called a mutation machine, proposed in our previous work [5,10,7]. The mutation machine contains a specification machine and extends it with a number of mutated transitions modelling potential faults. Mutated transitions might be viewed as faults injected in the specification machine, see, e.g., [25]. Thus, a single mutant represents multiple mutations and mutation machine represents numerous mutants. In our previous work, methods were developed for test generation using this fault model [5,10,7]. The main idea was to adjust classical checking experiments for a fault domain smaller than the universe of all FSMs. A checking experiment once obtained is in fact a complete test suite, however, this approach does not offer a means of analyzing mutation coverage of an arbitrary test suite or individual tests.
Traditional program mutation testing uses explicit mutant enumeration to determine test adequacy or mutation score. It is a ratio of the number of dead mutants to the number of non-equivalent mutants. We are not aware of any attempt to characterize a fault detection power of tests considering multiple mutants that avoids their enumeration.
The paper aims at solving this problem. We propose a method for analyzing mutation coverage of tests which we cast as a constraint satisfaction problem. The approach is based on logical encoding and SMT-solving, it avoids enumeration of mutants while still offering a possibility to estimate the test adequacy (mutation score). The analysis procedure can be used for test prioritization and test minimization, and could eventually lead to an incremental test generation.
The remaining of this paper is organized as follows. Section 2 defines a specification model as well as a fault model. In Section 3, we develop a method for mutation coverage analysis. Section 4 reports on our preliminary experiments performed on an industrial controller. Section 5 summarizes our contributions and indicates future work.
is reduced if any pair of its states is distinguishable, i.e., for every s1, s2  S there exists   I * such that outM(s1, ) ≠ outM(s2, ),  is called a distinguishing sequence for states s1 and s2, this is denoted s1 ≄ s2.
We also use relations between machines. Given FSMs M = (S, s0, I, O, T) and N = (P, p0, In this paper, we use equivalence relation between machines as a conformance relation between implementation and specification machines.  A mutant B is nonconforming if it is not equivalent to A, otherwise, it is called a conforming mutant. We say that input sequence   I* such that B ≄ A detects or kills the mutant B. The tuple A, ≃, Sub(M) is a fault model following [11]. For a given specification machine A the equivalence partitions the set Sub(M) into conforming implementations and faulty ones. In this paper, we do not require the FSM A to be reduced, this implies that a conforming mutant may have fewer states than the specification A; on the other hand, we assume that no fault creates new states in implementations, hence mutants with more states than the specification FSM are not in the fault domain Sub(M).
Consider the following example.

Fig. 1.
A mutation machine with the specification machine as its submachine, where mutated transitions are depicted with dash lines, state 1 is the initial state.
The mutation machine M contains six suspicious transitions, one mutated transition represents output fault and the other two transfer faults. M contains eight deterministic submachines, the specification machine and seven mutants which share the same five trusted transitions.
As discussed in previous work [5,10,7], the mutation machine formally models test hypotheses about potential implementation faults. The mutation machine M allows compact representation of numerous mutants in the fault domain Sub(M). More precisely, their number is given by the following formula: In the extreme case, considered in classical checking experiments a fault domain is the universe of all machines with a given number of states and fixed alphabets. The corresponding mutation machine becomes in this case a chaos machine with all possible transitions between each pair of states. The number of FSMs it represents is the product of the numbers of states and outputs to the power of the product of the numbers of states and inputs.

3
Mutation testing In the domain of program mutation testing, such a test suite is often called adequate for a program (in our case, a specification machine) relative to a finite collection of programs (in our case the set of mutants), see, e.g., [4].
Differently from the classical program mutation testing, where the mutant killing tests are constructed mostly manually, in case of deterministic FSMs, tests that kill a given mutant FSM can be obtained from the product of the two machines, see, e.g., [2,1,27]. The problem can also be cast as model checking for a reachability property, considered in several work, see, e.g., [18]. This approach can also be used to check whether a given test kills mutants, but it requires mutant enumeration.
In this work, we develop an analysis approach that avoids mutant enumeration while still offering a possibility to estimate the test adequacy (mutation score).

Distinguishing automaton
Tests detecting mutants of the specification are presented in a product of the specification and mutation machines obtained by composing their transitions as follows.
We illustrate the definition using the specification and mutation machines in Fig. 1. Notice that for any nonconforming mutant there exists an input sequence of length at most n 2 , where n is the number of states of the specification machine, since distinguishing automaton has no more than n 2 states.
At the same time, not each and every word of the language detects a mutant. An input sequence  LD triggers several executions in the distinguishing automaton D which are defined by a single execution in the specification machine A and some execution in the mutation machine M both triggered by . The latter to represent a mutant must be deterministic. Such a deterministic execution of the mutation machine M defining (together with the execution of A) an execution of the distinguishing automaton D to the sink state is called -revealing. Input sequences triggering revealing executions enjoy a nice property of being able to detect mutants.

Theorem 2.
Given an input sequence   I* such that  LD, an -revealing execution includes at least one mutated transition, moreover, each mutant which has this execution is detected by the input sequence .
Given an input sequence  LD, the question arises how all the mutants (un)detected by this input sequence can be characterized. We address this question in the next section.

Mutation coverage analysis
Consider an input sequence   I* which detects a nonconforming mutant by triggering -revealing executions. Analyzing these executions we can determine all mutated transitions involved in each of them. This analysis can performed by using a distinguishing automaton constrained to a given input sequence. Let   I* and Pref() be the set of all prefixes of . We define a linear automaton (Pref(), , I, D), such that each prefix of  is a state, and (, We illustrate the definition using the input sequence  = baaba for the specification and mutation machines in Fig. 1. Notice that the sequence hits all the mutated transitions in the mutation machine. The resulting -distinguishing automaton for A and M is shown in Fig. 3.  3, a, 0, 3)(3, b, 0, 3)(3, a, 0, 3).
The suspicious transitions are in bold. The executions are deterministic and include two mutated transitions (3, a, 1, 3) and (3, b, 0, 3). The third mutated transition (4, a, 1, 2) is in the execution that does not lead to the sink state ∇. Hence, the input sequence baaba detects any mutant with two out of three mutated transitions.
As stated above any mutant with the transition relation satisfying one of these constraints is detected by the input sequence babaaba or its prefix, since a wrong output sequence should be produced by such a mutant. On the other hand, a mutant that does not satisfy any of them escapes detection by this input sequence. To characterize these mutants, we formulate constraints which exclude all the executions of detected mutants by considering the negation of the disjunction of the constraints for all the triggered revealing executions. The resulting constraint becomes a conjunction of negated constraints of the executions.
For instance, the negated first constraint is ( To formalize the above discussions we cast the execution analysis as a constraint satisfaction problem by using auxiliary variables to specify the choices between suspicious transitions. Let T1, T2, …, Tm be the sets of suspicious transitions, where unaltered transitions are the first elements and the remaining elements of each set are lexicographically ordered. We introduce auxiliary variables z1, z2, …, zm, such that variable zi represents the suspicious set Ti. For the variable zi the domain is Di = {1, 2, …, |Ti|}, such that zi = 1 represents the unaltered transition in the set Ti and the other values correspond to mutated transitions. We use conditional operators {=, } and logical operators AND () and OR () for constraint formulas.
Each execution of a mutation machine that involves suspicious transitions yields assignments on variables representing these transitions, which expresses a constraint formula as the conjunction of individual assignments (zi = c), where c  Di. Then the negated constraint formula becomes the disjunction of individual constraints (zi  c).
A set of revealing executions triggered by one or more input sequences is then the conjunction of disjunctions of individual constraints.
In The constraint formula becomes: Clearly, the formula always has a solution where values of variables determine unaltered transitions representing a specification machine, but we need a solution if it exists which has at least one mutated transition. To this end, we add the constraint (z1  1)  (z2  1)  (z3  1) excluding the solution defining the specification machine.
The final constraint formula is To solve it, we use the SMT solver Yices [23] which finds the solution (z1 = 2), (z2 = 1), (z3 = 1). The solution defines a mutant with the single mutated transition (3, a, 1, 3). The mutant is nonconforming, which can be verified with the help of a distinguishing automaton obtained for the specification machine and the mutant. This means that the input sequence babaaba does not detect the mutant defined by the solution. To ensure its detection we have two options, to add a new input sequence or to try to extend the input sequence babaaba until it detects the remaining mutant. The latter option avoids using the reset operation in testing, required in the former option.
Following the first option we notice that the input sequence which detects the escaped mutant is baa already obtained in the example of the -distinguishing automaton in Fig. 3, where = baaba. Considering the revealing execution (1, b, 0, 2)(2, a,  0, 3)(3, a, 1, 3) triggered by its prefix baa, we generate an additional constraint (z1  2) which prevents the suspicious transition (3, a, 1, 3) to be chosen and add it to the final constraint formula which has no solution. The set {babaaba, baa} is therefore a complete test suite for the specification machine A and mutation machine M in Fig. 1.
Following the second option, we find that it is possible to extend the input sequence babaaba which leaves the specification machine in state 3 with the input a to detect the mutated transition (3, a, 1, 3). As before, we add constraint (z1  2) and the final constraint has no solution. The set {babaabaa} is also a complete test suite.
This example indicates that various test generation strategies could be investigated, complementing checking experiments and checking sequences approaches. The latter allows one to avoid using multiple resets in testing. Notice that a classical checking experiment for this example derived by using, e.g., the W-method [12,13], contains many more input sequences, moreover, the specification machine in Fig. 1 has no distinguishing sequence, which is usually required to generate a checking sequence. By this reason the existing methods cannot construct a single test, however, the example indicates that the mutation analysis allows us to do so. We leave the detailed elaboration of a test generation method for future work and formulate in this paper a procedure for mutant coverage analysis.
The procedure uses as inputs a test suite TS for a specification machine A and mutation machine M and consists of the following steps: 1. For each input sequence   TS (a) Determine the -distinguishing automaton (b) Find all executions leading to the sink state (c) Determine -revealing executions of the mutation machine (d) Build the disjunction of constraints excluding the -revealing executions 2. Build the conjunction of the obtained disjunctions and add the constraint that excludes the solution defining the specification machine 3. Solve the constraint formula by calling a solver 4. If it finds no solution terminate with the message "TS is complete", otherwise check whether the mutant defined by a solution is conforming 5. If it is nonconforming terminate with the message "TS is incomplete", otherwise add the constraint that excludes the solution defining the conforming mutant and go to Step 3.
The main steps of the procedure have already been discussed and illustrated on the examples, except of the last two steps which deserve more explanation. Constraint solvers normally provide a single solution if it exists. An extra constraint prevents the solution to point to just the specification machine, but the found solution may correspond to a conforming mutant. In the domain of general mutation testing the problem of dealing with mutants equivalent, i.e., conforming, to the specification is well understood. In testing from an FSM, most approaches assume that the specification machine is reduced, so conforming mutants are isomorphic machines. Checking FSM equivalence is based on an FSM product. Notice that the proposed approach does not require the specification machine be reduced.
The complexity of the proposed method is defined by the number of constraints. We expect that the method scales well, since the recent advances in solving techniques drastically improve their scalability [23,24]. The number of constraints for a single execution is limited by the number of states of a mutation machine, but the number of executions increases with the number of mutated transitions. On the other hand, the number of executions of the distinguishing automaton which do not end up in the sink state grows with the number of mutated transitions, as faults may compensate each other. These executions are not revealing and do not contribute to the number of constraints. In Section 4 we present the results of our preliminary experiments performed on an industrial controller to assess the scalability of the approach.

Applications
The proposed mutation coverage analysis approach allows one to check if a given test suite is a complete test suite. A logical formula constructed by the proposed method represents the coverage of the test suite for a given fault model. If the test suite is found to be incomplete the question arises on how its quality in terms of fault coverage can be characterized. In the traditional software mutation testing, the fault detection power of tests is characterized by mutation score. It is a ratio of the number of killed mutants to the number of non-equivalent mutants. Note that the number of all possible mutants remains unknown and the mutation score is determined based on a limited set of generated mutants. As opposed to this approach, in our approach the total number of mutants can always be determined using the formula given in Section 2.2. Moreover, while the mutation analysis method avoids complete mutant enumeration, it does generate conforming mutants while searching for nonconforming ones.
The enumeration of conforming mutants is achieved by adding constraints to a logical formula excluding repeated generation of already found mutants.
In the same vain, our method can be enhanced to generate and enumerate (at least partially) undetected nonconforming mutants. Once a nonconforming mutant is given by a solution found by a SMT solver and the method terminates declaring the test suite to be incomplete, we may continue this process by adding a constraint excluding its repeated generation. As a result a list of nonconforming mutants can be obtained. Two extreme cases of incomplete tests are worth to be discussed here.
First, a given test suite may have no detection capability at all. This property is in fact detected very early by the method; in this case all the -distinguishing automata have no sink state reachable from the initial states, tests generate no constraints, the method can terminate at this step since there is no need to call a solver. No mutant in Sub(M) is killed, the score is zero.
Second, a given test suite is "almost" complete and kills most of the mutants in Sub(M). In this case, the process of nonconforming mutant generation does not take much time and once terminated yields the number of conforming mutants c as well as the number of survived nonconforming ones n. Then the mutation score is computed as follows: It is worth to note that the way the mutation score is determined is completely different from that in software mutation testing, as our method generates mutants based on a given test suite and not the other way around.
When a given test suite is "far" from being complete the number of survived nonconforming mutants can explode especially when a mutation machine is close to a complete chaos machine which represents the complete universe of FSMs. In this situation one possible solution to cope with the mutant explosion problem is to terminate generating nonconforming mutants once their number reaches a predefined maximum, e.g., a percentage of |Sub(M)| or the time period allocated for mutation analysis ends. The obtained score is an (optimistic) estimation of an upper bound of the actual mutation score.
The proposed procedure could also be used for test minimization by defining a subsume relation between tests based on comparison of the logical formulas generated from them. Tests subsumed by other tests can always be removed from the original test suite. Similarly the generated formulas can be used to prioritize tests when needed, see, e.g., [28].

Experimental results
In this section we report on a prototype tool implementing the proposed approach and its use on a case study of an FSM model of an automotive controller of industrial size.

Prototype tool
The prototype tool takes as inputs a mutation machine and a test suite, both described in text format. The inputs are parsed with an ANTLR-based module [30] to build an internal representation of the two objects. The mutation analysis algorithm manipulates these representations to build -distinguishing automata, determine revealing executions of the mutation machine and generate constraints for the Yices SMT solver [23]. The solver is used as a backend to decide the satisfiability of the constraints. The tool parses the outputs from Yices to extract a solution if it is found to build a mutant. The prototype can also be used with other SMT solvers compatible with the SMT-LIB 2.0.

Case study
In our experiments, we use as a case study an automotive controller of the air quality system, which we also used in our previous work [29]. The functionality of the controller is to set an air source position depending on its current state and a current input from the environment. The controller is initially specified as a hierarchical Simulink Stateflow model. Fig.  5 gives an overview of the model which is composed of three super-states s1, s2 and s23 and 13 simple states. Each super-state is composed of states and transitions. The initial state is the simple state s3. To obtain an FSM we introduced an input alphabet replacing transitions guards and flattened the hierarchical machine. We have identified 24 abstract inputs and two outputs. The resulting FSM has 14 states, since we added (for modeling of a branching behavior implemented with C code in the original state) one extra state to the given 13 simple states. It has 24 x 14 = 336 transitions.
The mutation machine was constructed from the following assumption about potential implementation faults. These faults may occur in outgoing transitions from any of the simple states in two super-states, namely s2 and s23 and four inputs, as Table 1 shows. The obtained mutation machine has 46 mutated transitions. The formula in Section 2 gives the number of mutants being equal to 3 12 x 2 17 = 69,657,034,752 including the specification machine.  s21 s22 s231 s232 s233 s234 s235  a2  3 3 3  3  3  3  3  a4  2 2 2  2  2  2  2  a14  1 1 3  3  3  3  3  a16 1 1 4 4 4 4 4 Tab. 1. The numbers of transitions for some pairs of states and inputs in the mutation machine (for the remaining pairs no mutated transitions were added).

Mutation analysis
To perform the mutation analysis, we needed a test suite, which could be generated randomly, however, we find it difficult to obtain tests that hit suspicious transitions in this case study, since 26 out of 336 transitions of the specification machine become suspicious in the mutation machine. We decided to use an early prototype of a test generation tool (which is work in progress) as an input for the mutation analysis tool. The tool generates test cases one by one, so that the mutation analysis tool processes a test suite of an increasing size. The process terminates once a current test suite is found to be complete. In this experiment, the test suite completeness was determined when it had 31 test cases. The length of the test cases varies from 4 to 25 and the number of revealing executions triggered by each of them varies from 1 to 13. In the last, 31 st execution of Yices, it was given the formula of 69 clauses, for which it found no solution, meaning that the test suite is complete for the given mutation machine. The mutation analysis process took less than one minute on a desktop computer with the following settings: 3.4 Ghz Intel Core i7-3770 CPU, 16.0 GB of RAM, Yices 2.4.1, and ANTLR 4.5.1. The fact that the tool was able to determine that the given test suite kills each nonconforming mutant out of 69,657,034,752 possible mutants indicates that the approach scales sufficiently well on a typical automotive controller even when the number of mutants is big. In this experiment, we varied only the number of tests (from 1 to 31), hence more experiments by varying the specification as well as mutation machines are needed to assess the tool scalability.

Conclusions
In this paper we focused on fault model based testing, assuming that a fault model is given as a tuple of a specification FSM, equivalence as a conformance relation and a fault domain. A fault domain is a set of implementation machines, aka mutants, each of which models some faults, such as output, transfer or transition faults. Avoiding their enumeration we define the fault domain as a set of all possible submachines of a given nondeterministic FSM, called a mutation machine, as we did in our previous work. The mutation machine contains a specification machine and extends it with a number of mutated transitions, modelling potential faults. Thus a single mutant represents multiple mutations and mutation machine represents numerous mutants. In the area of mutation testing we could not find any attempt to analyze fault detection power of tests considering multiple mutants that avoids their enumeration. We proposed a method for analyzing mutation coverage of tests which we cast as a constraint satisfaction problem. The method relies on the notion of a distinguishing automaton that is a product of the specification and mutation machines. To analyze mutation coverage of a single input sequence we define a distinguishing automaton constrained by this sequence. This allows us to determine all mutants revealing executions that are triggered by the input sequence. The executions are then used to build constraint formulas to be solved by an existing solver, Yices, in our experiments. The approach avoids enumeration of mutants while still offering a possibility to estimate the test adequacy (mutation score).
The preliminary experiments performed on an industrial controller indicate that the approach scales sufficiently well. We are planning to further enhance the approach to Extended FSMs [17] using mutation operators already defined for this type of FSMs.