Inferring Finite State Machines Without Reset Using State Identification Sequences

Identifying the finite state control structure of a black box system from the traces observed in finite interaction is of great interest for many model-based activities, such as model-based testing or model-driven engineering. There are several inference methods, but all those methods assume that the system can be reset whenever necessary. In this paper, we address the issue of inferring a finite state machine FSM that cannot be reset; we propose a method, inspired by FSM-based testing generation methods. We assume classical testing hypotheses, namely that we are given a bound n on the number of states and a set W of characterizing sequences to distinguish states. To the best of our knowledge, this is the first model inference method that does not require resetting the system, and does not require an external oracle to decide on equivalence. The length of the test sequence is polynomial in n and the exponent depends on the cardinal |W| of the characterization set.


Introduction
For model-driven software engineering, model-based testing including, it is important to rely on models of the software artefacts. It is also essential that the models are up-to-date. In many contexts, however, such models are not available. In the last decade, interest has risen on methods to retrieve models from software artefacts, see, e.g., [1,2,4,7,10,12,17]. Depending on the context, goal and assumptions, various types of techniques have been considered for specification mining, reengineering or model inference. When source code is available, automated analysis of its structure can yield adequate models [5,11].
Various algorithms have been proposed for inference of finite state machines. Such methods have been used to retrieve finite state behavioural models of black box components by testing them [16,17]. Typically, such components could be accessed over a network, so that we do not even assume that the executable can be scrutinized: the system can only be observed at its interfaces [1]. This corresponds to a typical black box testing scenario, where the tester would send inputs to a system and observe its outputs. This scenario arises in many practical situations. For instance, a component has to be treated as a black box, when its internals cannot be available due to intellectual property constraints.
We assume that the System Under Test (SUT) can be modelled, at some level of abstraction on its inputs and outputs, as a Finite State Machine (FSM). FSM-based testing theory has shown that an FSM can be identified, i.e., the SUT can be tested to be proven equivalent to it, with the help of state identifying (distinguishing) sequences, constituting, e.g., a characterization set (W-set) of input sequences.
Existing inference algorithms assume that the black box can be reliably reset to its initial state. This makes the inference task somehow easier, because it is possible to start each new experiment from a known state, and explore progressively the neighbouring states. But there are cases where a black box cannot be reset, or where it is unsure that we can rely on it being restored to the same initial state. There are also cases where restarting the system completely might be very costly requiring a lot of time to reset the whole configuration (e.g., rebooting a machine, with possibly many software components to configure and reinitialize). Actually, our research was triggered by a case study from the SPaCIoS European project where we had to infer a software provided (as is common now) as a virtual machine. Although an i/o interaction in HTTP with it over the local network would require less than a millisecond, resetting the application would need more than a minute, around 10 5 more than an i/o.
In this paper, we propose an algorithm that can infer a black-box implementation without resetting it. This problem has similarities with classical testing methods, in particular the DS-method [6] and the W-method [18]. In classical testing methods, we are provided with a characterization set, and we derive a checking sequence or experiment that identifies a black box. The key difference is that classical testing methods start from a known specification machine, and just check whether the black box is equivalent (or conformant) to this specification. Therefore, those methods heavily rely on transfer sequences that make it possible to test a new state through a path known to transfer to the right state in the specification. In our context, since no specification is available, we cannot rely on known transfer sequences. Although the absence of reset had already been addressed by Hennie [6] and others in the case of a single distinguishing sequence, the task is made much harder in our context since we cannot return to a known state to compare the responses to a state of the specification. Moreover, we consider the case where the black box may not have a distinguishing sequence. This problem has been investigated in [6,8,13,14], where a characterization set is used. It turns out that generating a checking sequence (without reset) from a known specification FSM with a characterization set is very costly, due to the fact that the sequences of this set have to be applied a number of times which is proportional to some exponential on the number of states in the specification [14]. Recent work [8,13] has reduced the effort by some constant factor, which is a practical improvement; nonetheless, the complexity remains exponential.
As with other identification problems in FSM based testing, we rely on assumptions on the black box. First, we assume that an upper bound n on the number of states of the black box is known. Second, we are provided with a set W of input sequences that characterize the different states of the black box: each state produces a different set of output responses to these input sequences. Such a set could be derived from a previous version of the software or from domain-specific knowledge.
Rivest and Schapire [15] pioneered the inference of automata without reset, but their method was based on Angluin's L * algorithm [3] which assumes that an oracle can answer equivalence queries. In a typical software testing context, such an oracle cannot be provided. Our method does not require any equivalence query, and there is no oracle apart from the black box. In order to guarantee equivalence, we just assume we know an upper bound on the number of states, which is a much weaker assumption. Identifying states by their i/o responses to sequences of W is somehow similar to the set of suffixes in L * although we do not need suffix-closure contrary to L * .
Our method relies on a localizer, a procedure that can reliably bring the black box to the same state all over again to make sure that responses to all sequences from the Wset are observed from a single state. Therefore, it is possible to extend progressively the knowledge of the transitions starting from that state, while being sure that we actually come back to a known (previously learnt) state. This procedure is inspired by the construction of locating sequence used in [6,14]. The algorithm uses the localizer to start from a known state, which becomes the first learnt state. Thus, it does not require the black box to be initially in a particular state. It then tries to characterize progressively the tail state reached after the application of the localizer by repeatedly applying the sequences from W. By doing so, it will often end up in unknown states, therefore the algorithm alternates between applications of sequences from W to get more knowledge and applications of the localizer to restart from some previously learnt state. The algorithm identifies a correct model in O(p(f + p) 2 p n p+2 ) inputs, where n is the number of states of the machine, f is the size of the input set and p is the size of the W-set. Hence, the length of the identification sequence is polynomial in the number of states, although the exponent depends on the number of characterizing sequences.
The paper is organized as follows. Section 2 provides definitions and notations for our method. Section 3 describes the inference procedure, while Sect. 4 is dedicated to the localizer subroutine. Section 5 illustrates the algorithm on a small example. Proofs of termination and correctness, as well as an upper bound on complexity are in Sect. 6. Section 7 discusses assumptions and Sect. 8 concludes. δ and λ are actually mappings, i.e., dom(δ) = dom(λ) = S × I since we only consider complete machines. As M is deterministic, a tuple (s, x) 2 S × I uniquely determines a transition of M. For simplicity we use (s, x) to denote the transition, thus omitting its output and final state. We extend the transition and output functions from input symbols to input sequences, including the empty sequence ε, as usual: δ(s, ε) = s and λ(s, ε) = ε, for s 2 S; for αx 2 I + , δ(s, αx) = δ(δ(s, α), x) and λ(s, αx) = λ(s, α)λ(δ(s, α), x). An FSM M is said to be strongly connected, if for each pair of states s, s′ 2 S, there exists an input sequence α 2 I * , such that δ(s, α) = s′; α is called a transfer sequence.
Two states s, s′ 2 S are distinguishable, if there exists γ 2 I * , such that λ(s, γ) ≠λ(s′, γ). We say that γ distinguishes s and s′. Given a set H I * , states s and s′ are Hequivalent, if λ(s, γ) = λ(s′, γ) for all γ 2 H. Otherwise, i.e., if there exists γ 2 H such that λ(s, γ) ≠ λ(s′, γ), the states are H-distinguishable. We define H-distinguishability and H-equivalence of machines as a corresponding relation between their initial states. An FSM M is minimal, if all states are pairwise distinguishable. In this paper, the machines are assumed to be minimal and strongly connected. A set of inputs W is a characterization set for an FSM M if each pair of states is W-distinguishable.
A sequence of input/output pairs α 2 (IO)* is a trace. Given traces α, β, and ω, such that ω = αβ, we write ω \ β, when we need to refer to the result of deleting the suffix β from the trace ω, resulting in trace α. The notation α ≤ ω is used to refer to a prefix α of the trace ω; ω↓A, where A I [ O, denotes the projection of ω to A, obtained by removing symbols that are not in A from the trace ω. Formally, ε↓A = ε; if x 2 A, then (αx)↓A = (α↓A)x; if x 6 2 A, then (αx)↓A = (α↓A).
Given a machine M = (S, s 0 , I, O, δ, λ) with a state s i and an input sequence α 2 I * , we use tr i (α) to denote the trace from state s i such that tr i (α)↓I = α and tr i (α)↓O = λ(s i , α). Given a characterization set W, rather than naming or numbering states (such as "s i "), we may refer to a state by its state characterization Tr i (W) = {tr i (w) | w 2 W}. A state characterization Tr i can be seen as a mapping q i from W to (IO)* such that q i (w) = tr i (w). The set of all mappings Q = {q 1 , q 2 , …, q n } corresponds to the set of states of the machine. Namely, for q 2 Q and s 2 S, we write q ↔ s if 8w 2 W, q(w)↓O = λ(s, w). Inferring an unknown FSM with the same characterization set, each mapping q 2 Q can then be considered as its state. In the sequel, we use B (instead of the general M) to refer to a black-box machine, P (instead of S) to refer to its states, whereas the inferred model (also called "conjecture", in line with L * ) will have states denoted by Q.

Definitions for the Inference Method
Let B = (P, p 0 , I, O, δ, λ) denote an FSM modelling a given black-box (BB) with the characterization set W. As the BB cannot be reset, it is possible to observe only a single trace from it. We introduce the notion of labelling function C: (IO)* →Q [ {⊥}, which maps each prefix of an observed trace to a state q of Q, if it can be inferred that the machine B will be in state p such that q ↔ p after the observed trace; otherwise C maps the prefix to an unknown state, denoted ⊥.
Let ω be an observed trace. A labelling C is deterministic for ω if the following property holds: For a; a 0 2 IO ð ÞÃ; such that a x; and a 0 x; if CðaÞ ¼ Cða 0 Þ 6 ¼ ?; Obviously, a labelling should be deterministic. It should also be consistent with the state characterizations, i.e., the following property should also hold: then q w ð Þ ¼ c: We capture the notion of known (or inferred) state by stating that a labelling C should be revealing. Let B = (P, p 0 , I, O, δ, λ), a labelling C is revealing w.r.t. B if, and only if 8α 2 (IO)* (9p 2 P s.t. 8p′ 2 P (δ(p′, α↓I)) = p) ⇒ 9q s.t. C(α) = q and q ↔ p. In other words, this property forces C to be defined on any trace α that is "homing" [9], i.e., which leads unambiguously to a single state with a known state characterization in the FSM B, irrespective of the start state.
The labelling function and state characterizations are the main components of the proposed method. To ease the presentation and simplify the algorithm, we also introduce several auxiliary notations and definitions, derived from the labelling function.
A trace is verified if it is a subtrace of the observed trace ω and its start and end states are known (labelled by C). Each verified trace is the result of execution of several transitions between the known states of the FSM B. We define a set of tuples containing a verified trace together with its start and end states We will also use a subset of it to refer to single transitions, called verified transitions, and not to their sequences. Let T V, T = {(q, xo, q′) | x 2 I, o 2 O and 9σxo ≤ ω, C(σ) = q, C(σxo) = q′} be the set of verified transitions. We will use as well the term of input verified in state when we do not need to refer to the end state of a verified transition. Thus R = {(q, x) 2 Q × I | 9o 2 O, q′ 2 Q, (q, xo, q′) 2 T} is the set of inputs verified in corresponding states.
Moreover, since we will discover new states as the observed trace grows, we will have a set of known states associated with a current trace ω. Q C denotes a set of states discovered by ω such that q 2 Q C iff C(α) = q, for some α ≤ ω.
We shall be able to derive from ω a complete conjecture, i.e., the FSM B, when for all q 2 Q C , for all x 2 I, (q, x) 2 R.
We further assume that the set W is an ordered set of p sequences and use tr(w p ) to denote a subtrace produced by the BB when the last sequence of W viz. w p is applied.
Finally, K Q × (IO) + × (IO)* is used to keep track of the applications of w 2 W in a state q followed by either a transition or a trace of a sequence from W. (q, α, γ) 2 K, if there exists a trace β, such that βαγ ≤ ω, C(β) = q, α↓I 2 I [ W and γ↓I 2 W.

Inference Procedure
A localizer procedure L(ω, W) is used to ensure that we continue learning transitions from states visited before. The localizer procedure, given a current trace ω, produces an updated trace ω′, a set of traces Tr(W), and the last appended trace tr(w p ) to label the state which has produced it and is thus identified with Tr(W). Recall that a state is defined by its answers to sequences from W, i.e., by Tr(W). At the end of the localizer, we know the BB is in state δ(Tr(W), w p ), and the state reached after ω\tr(w p ) can be labelled as Tr(W).
Before defining the localizer, we present the main algorithm that calls it.
Build the conjecture from Q C and T.
The sets K, V (hence T) and R should be updated to reflect the changes in C. The update is done by applying the following rules as long as possible (fix point iterations with monotonic increase of the sets): Notice that although in this definition of updates we derive V and K sets to be consistent with their definitions, we do not need to add all elements to them. Typically, V is transitively closed, but we only need a transitive reduction of it. If (q, α, q′) 2 V and (q′, β, q′′) 2 V then in the implementation of the procedure we would not store (q, αβ, q′′) in V because we use V to find a shortest concatenation of sequences that would themselves be in V or to verify transitions (sequences of length 1).

Localizer Procedure
In our inference procedure, the main use of the localizer procedure is to ensure that when the BB is in an unknown state q, we can restart a test from a state q′ of the BB for which we can be certain that we know all its traces for all sequences from the W-set, and thus we can identify state q′ of the BB. Actually, we present a generalized procedure that can work with any set of sequences Z, where Z may not be a fully characterizing set, i.e., there could be non-equivalent states in the BB that are Z-equivalent. The localizer procedure will be defined recursively with increasing subsets Z of W.
For the design of this localizer procedure, we extend an idea that had already been investigated by Hennie [6]. The key idea is that since the number of states is finite and bounded by n, by repeatedly applying a given input sequence α we can observe at most n different output sequences. This implies that in the state reached applying n times α the BB must have reached a cycle, coming to one of the states visited after some α i . But we do not know which one because different states could still have the same output response to α. The length of the cycle is itself bounded by n. As the proof will show, it is enough to repeat α another n -1 times (so 2n -1 in total) in the worst case to identify the cycle and be able to know what would be the response from the BB to the 2n-th application and all subsequent applications of α.
This means that after 2n -1 applications of α, we are in a state p and we can apply another input sequence β, and be sure that we know the response of p to both α and β. If {α, β} is a W-set, we can continue a test from a state p that has been fully characterized, or at least from δ(p, β) which is a well-defined state. Jumping to more than two sequences is a bit more elaborate, but it is ensured by the recursive definition of the localizer. Actually, we build embedded cycles.
We illustrate the above discussion on the example in Fig. 1. This 3-state machine has no distinguishing sequence. However, states can be fully characterized by their traces for sequences from the set W = {a, b}. Suppose we start our experiment when the machine is in state 1 by sending an input a. We shall get an output 1. The localizer will repeatedly apply the input sequence a. The second and third outputs will be 0. Since we know the machine has no more than 3 states, after 3 applications of a, the fourth application will elicit an output which must be one of those previously seen, 0 or 1. In this case, it will be a 0. Now, since the last 3 outputs were 0s, we can be sure that the fifth application of a would elicit a 0. We can now apply a b, and we get 0 as output, since we were trapped in state 2. We know that just before applying the b, we were in a state that answers 0 to both a and b. Since we assume W to be a fully characterizing set, the state is distinguishable from all others. We shall call it the state {a0, b0}, which corresponds to state 2 in the example.
Note that at the end of the localizer procedure we are not in the state p that has been fully characterized, but past it: in the example of Fig. 1 we are in state 1 instead of 2, and more generally with W = {α, β} we are actually in δ(p, β). This will also be the case for our generalized localizer: at the end of the procedure, the BB will be in a state that follows the application of the last input sequence w k of the Z set, and the state characterized with the set Z will be the state reached just before applying the last w k sequence. This is not an issue because our inference procedure handles this situation. Note that this means that the sequence built by the localizer is not a homing sequence [9], although it could easily be adapted to provide a kind of homing sequence, modulo the fact that the states are initially unknown, as we discover state characterization while applying the localizer. To make a homing sequence, we would just have to add an empty sequence ε as the last sequence w k+1 of Z.
The localizer procedure is formulated for fixed alphabets I and O, and an assumed bound on the number of states in the black box n. The procedure uses a given sequence ω that has already been observed from the BB and has left it in an unknown state, and given a non-empty ordered set Z of input sequences. It returns an updated ω′ trace (ω concatenated with the new inputs applied by the localizer and observed outputs), the set of traces for Z that can be confirmed as the traces of the state p reached just before the last w k sequence of Z, and the trace suffix observed from the state p. Note that if k = 2, the list w 1 , …, w k-2 is empty, hence Z 1 = (w 1 ) and Z 2 = (w 2 ). In the for loop, we repeatedly apply the same localizer sequence L(ω, Z 1 ). We store in an array τ i the trace observed for the last element of Z 1 , i.e., tr(w k-1 ). After the n-th application, we must have reached a cycle, where τ n+1 must be equal to some previous τ i . But it could be equal to several τ i therefore we must continue another n -1 times to ensure that we have completed the cycle. In any case, after 2n -1 applications, even if the length of the cycle is unknown, since τ j+n-2 = τ n+n-2 is in the cycle, applying w k-1 instead of w k would have produced τ j+n-1 . Now we know what would be the answer of the state to w k-1 and we can apply w k instead.
We illustrate the localizer procedure on our example in Fig. 1. We start from initial state 1, with ω = ε and Z = (a, b). We have Z 1 = (a) and Z 2 = (b). Since the for loop is executed 2n -1 = 5 times, we apply w 1 = a to BB 5 times and reach the 5th step of the trace. We get the trace a1a0a0a0a0, with τ 0 = a1, τ 1 = a0, τ 2 = a0, τ 3 = a0 and τ 4 = a0. At this point, we recognize the cycle τ 2 = τ 3 and τ 3 = τ 4 (j = 2) and we know that if we apply a, we would get 0. Subsequently we apply w 2 = b and get 0.

Example of FSM Inference
We now illustrate the proposed method on the example in Fig. 1. As illustrated above, applying the localizer procedure allows us to identify a first state of the machine: C(ω \ b0) = q 0 = {a0, b0}. Thus, the prefix obtained in the 5th step is labelled by q 0 . C is updated in the third line of the inference procedure.
We then enter the main while loop in the inference procedure. As the last state of ω (6th step in the trace) is not yet labelled we proceed to the else part of the first if statement. Then γ = b0, χ = a1a0a0a0a0 and σ = b0.
After the first if statement, q = q 0 , K is empty, and we choose w = a. We apply a, observe a1 and arrive at the 7th step. We get K = {(q 0 , b0, a1)}, V = R = ∅.
Since the condition of the second if statement is false, we proceed with the third if statement, which is true. We apply again the localizer procedure, and label the state after a0a0a0a0a0 (12th step) with q 0 : C(12) = q 0 . We then restart the while loop.
We now execute the third if statement: we apply the localizer procedure and arrive at the 20th step. C(19) = q 0 and C(20) = q 1 . K is unchanged. We then restart the while loop. Since the last (20th) step is now labelled, we enter the then part of the first if statement. α = ε and we choose x = a. We apply a, observe a1 and arrive at step 21. We have σ = a1. Now q = q 1 , and we choose w = b. We apply b, observe b0 and arrive at step 22. We update K = {(q 0 , b0, a1), (q 0 , b0, b1), (q 1 , a1, a0), (q 1 , a1, b0)}.
We restart the while loop. As the last step (30) is labelled, we enter the then part of the first if statement. As all transitions from q 1 are now known, we must apply a transfer sequence to move to an unverified transition. We choose to go to q 0 with α = a1 and x = a. We apply a, observe a1, apply a again, observe a0 and arrive at step 32. σ = a0.
We now exit from the while loop and V gives us the inferred finite state machine, which is isomorphic to the FSM in Fig. 1.

Proof of Localizer
The key theorem for our approach states that the localizer will ensure that the BB reached (before the application of the last sequence from W) a state that can be labelled with the mapping returned by the localizer. Theorem 1. When Z = W, from any starting state, the localizer will return (ω′, X, β) such that C(ω′ \ β) = X and X↓I = W.
Assuming it is true for some k ≥ 1, then by definition of P we have: And by the structure of the localizer, we have: Using the induction hypothesis, we get: P(w 1 , …, w k+1 ) = = ([P(w 1 , …, w k )w k+1 ] 2n−1 P(w 1 , …, w k ). QED.
Proof. We prove this by induction on |Z|.
• If q n = q p for some p > 0, then the length of the cycle is n -1 at most and for any j < n such that for all m 2 [0, n -2], λ j+m = λ n+m then we have: λ j+n-1 = λ n+n-1 = λ(s, w k-1 ). • If the greatest j is 0, this means there is no cycle of output of length less than n, then q n = q, q n = q 0 and s = q 2n-1 = q n-1 and λ(s, w k-1 ) = λ j+n-1 since j = 0. • So in both cases we have λ(s, w k-1 ) = λ j+n-1 = τ j+n-1 ↓O.

QED.
It may be worth mentioning that taking the largest j means we identify the shortest cycle of outputs. The cycle of states entered by the implementation will have a length that is divisible by n-j: with the notation of the proof, nj divides n -j′.
We can now proceed to the proof of the Theorem 1. Recall that we denote states of the BB by their Tr(W).
Actually, in all cases, Tr(X)↓I = Z. This directly follows from the recursive definition: it holds for a singleton, and when we compute Tr for (w 1 , …, w k ) we produce a set of traces that adds some tr(w k-1 ) thus the set of traces is computed on Z 2 = (w 1 , …, w k-2 , w k ). If Z = W then by Lemma 2, the definition of C and the fact that C is revealing, we have C(ω′ \ tr(w k )) = X and X↓I = W. QED.

Proof of the Method
To prove the correctness of the method we need to demonstrate that the mappings correspond to the states of the BB. We will use the following theorem.
Theorem 2. If q 2 Q C then 9p 2 P such that q ↔ p.
This theorem follows from the definitions of C and Q C . The proof of the method itself (captured by Theorem 3) proceeds in several steps following the structure of the algorithm. In particular we must prove that all the steps that rely on an existence ("find", "choose") are well defined and will find an element, and we show that the algorithm increases monotonically the set of known states and transitions, and terminates when a conjecture can be built which is equivalent to the FSM B = (P, p 0 , I, O, δ, λ).
1. On every entry in the while loop, either we are in a known state, i.e. the BB is in a state p such that C(ω) = q ↔ p or we are at the end of a localizer, i.e., C(ω \ tr(w p )) is defined (a known state). This is trivial on the first entry in the loop, since we just applied the localizer and did not move in the BB. And at the end of a cycle in the loop, we reach either after the case C(ω) = ⊥, in which case we reapply the localizer and are again at the end of it, or C(ω) is defined. 2. Find a shortest α: since we entered the while loop, we know that 9q′ 2 Q ω and x 2 I, such that (q′, x) 6 2 R. Moreover, we are in a state q. Then either 9x′ 2 I, such that (q, x′) 6 2 R, and we just have to pick α = ε; or we know all successors of q. In that case, we build the connected graph reachable from q through verified transitions: if one of them has an unverified transition, we pick the one closest to q, i.e., that yields the shortest α. Otherwise, this would imply that there is a q′ 2 Q C such that there is a prefix 9σ ≤ ω, C(σ) = q′ which would be distinct from all the successor states of q that constitute a strongly connected component. Since all successor states and this state q′ are all paired (through the ↔ relation) to states of the BB (by Theorem 2), this would contradict the assumption that the BB is strongly connected. 3. "Else find the shortest γ". In this else branch, we are in the case where C(ω) = ⊥.
The existence of γ follows from 1: γ = w p in this case. 4. "Choose w": this comes one line after the "end if". If we followed the "else" branch, state C(χσ) 6 2 dom(C) and there is a w for which (q, σ, tr(w)) 6 2 K. If we followed the "then" branch, then (q k+1 , x) 6 2 R and σ:= xo and by definition of R, we again have a w. 5. Progress: each loop iteration adds an element to K (by the structure of the loop), either at the end of a transition (xo) or at the end of a localizer. 6. Termination: there is a finite number of transitions in the BB (hence in the set Q C by Theorem 2), each one is followed by a finite number of traces from the (finite) set W. Hence the potential number of elements in K is finite, and the process will terminate. 7. Building the conjecture: based on Theorem 2, we can build a conjecture that is isomorphic to the BB by connecting all the transitions of R when the algorithm terminates.
Theorem 3. The inference procedure terminates and yields a conjecture that is isomorphic to the minimal FSM modelling the BB.

Complexity
As in all testing procedures, interacting with the system under test contributes mostly to the cost. Therefore we analyse the worst case complexity as the length (number of inputs) of the test sequence created by the procedure. The key element is the length of the localizer. Let L be the length of L(ω, W) and |w i | the length of an element of W. As in the notations of the algorithm, p = |W| is the number of elements of the characterization set, n is the number of states, and f is the number of inputs (fan-out). Thus, L can be computed as follows.
O(L) = |w1| (2n-1)p since we can assume that for any i, we have |wi| < n: it is known that in a complete FSM with n states, any two states can be distinguished by a sequence of length no more than n. Thus, the length of the localizer is polynomial in n and exponential in p.
We apply the localizer at most once for each execution of the loop. The loop itself adds at least one element to K, thus the number of iterations of the loop is n(f + p) p. Within the loop, the extension to ω would be U = max(|α| + 1 + |w|, |γ|+|w|). |γ| + |w| is bounded by 2n, and |α| is bounded by n 2 . Then the overall complexity would be L+ n (f + p)p (L + U) bounded by (n(f + p)p + 1)(L + n 2 + n + 1). As a coarse bound, we have (n(f + p)p + 1)(n(2n -1) p + n 2 + n + 1) which is O(p(f + p) 2 p n p+2 ), again polynomial in n and exponential in p.
Of course, this is a coarse upper bound for a worst-case complexity, and it is unsure it could be reached or even approached. As can be seen from the example, as soon as states and transitions are added to Q and V, the number of uses of the localizer decreases. Given the structure of the localizer, the sequences in W would be ordered by increasing length. And the localizer could be shortened because we may recognize cycles before reaching 2n -1 iterations.
Our experiments with hundreds of randomly generated machines show that for n = 15, f = 10, inference is achieved in around 10 4 steps for p = 2 (around 7000 when |w 1 | = 1, and around 15000 when |w 1 | = 2), i.e., around 500 times (3 orders of magnitude) less than the theoretical bound. To put this in perspective, we had initially tried classical inference methods [12,16] on web applications, where a reset would typically take more than 1 min (restarting a virtual machine and restoring data configuration), whereas interacting with it over a local network would take around 1 ms per I/O pair. In such typical contexts, an inference sequence of length 10 5 would require no more time than a single reset.

Discussion
Our method assumes we know an upper bound n on the number of states of the BB and a characterization set W for it. We now discuss the impact of these assumptions. First let us remark that the localizer does not assume that W is characterizing. It can work with any set Z and just ensures that we characterize a state reached w.r.t. Z-equivalence [12]. However, an incorrect value for n could disrupt the localizer.
Actually, if either n or W is wrong (i.e., the assumptions for the BB are not satisfied), then two cases can occur: 1. The inference procedure runs to the end, but produces an FSM that is not isomorphic to a model of the BB. 2. The inference procedure detects an inconsistency (or fails to converge).
In the first case, we can either consider that the inference procedure provides a model that is approximate, and consistent with all observations: it will have produced an unrefined model of the BB that may be enough for the intended use of the model. Alternatively, we could now use the inferred model to check whether the BB conforms to this model, by some method of conformance testing, e.g., randomly walking through the model until we find a discrepancy or conclude on some level of confidence for the approximated model.
In the second case, let us first consider the case where n was an incorrect bound (too low), and this impacts the localizer. The only problem that can occur there is in the line "find the greatest j". We may fail to recognize a cycle in 2n -1 iterations. This can be easily addressed: first, we would continue iterating the for loop until we spot such an output cycle; then, we increase the value of n accordingly; at this point, we restart the whole inference procedure, because we cannot trust the previous equivalence of states.
Let us now consider the case where W would not be characterizing. This will not create any problem in the localizer, but in the main procedure. When we reapply a verified sequence α to go to a given state q k+1 or when we reapply an input x we may get a different output. This implies that the sequence we just applied distinguishes two states that we thought were equivalent. That sequence, or a part of it (up to the first divergence) could be added to W.
This can also be related to the initial assumption about W. The main issue is about how a characterizing set for a black box can be known. One possibility could be that we had such a set for a previous version of an implementation, and we take it as input of the method when we need to infer an updated implementation. It may also be possible to formulate domain-specific heuristics to identify candidate characterization sets.

Conclusion
We have presented a method that can infer a model of a non-resettable black box FSM for which we know an upper bound n on the number of states and a characterizing set W. The method is polynomial in n; the degree of the polynomial is bounded by the cardinality of W. It has now been implemented, and we are investigating its effectiveness under various settings. We are in particular interested in assessing the average complexity for various cases of characterization sets.
It might also be interesting to see how the algorithm proposed here could benefit from the improvements that have been proposed in the related problem of conformance testing for non-resettable machines. As mentioned in several parts of the paper, the algorithm lends itself to optimizations that could reduce the length of the sequence of inputs. In particular, the number of iterations in the localizer could be reduced when short cycles are recognized.
Another direction for research is on relaxing the assumptions, and providing adaptive heuristics. One direction could be to work towards inference of quotients [12].