NFA-to-DFA Trade-Off for Regular Operations

. We examine the operational state complexity assuming that the operands of a regular operation are represented by nondeterministic ﬁnite automata, while the language resulting from the operation is required to be represented by a deterministic ﬁnite automaton. We get tight upper bounds 2 n for complementation, reversal, and star, 2 m for left and right quotient, 2 m + n for union and symmetric diﬀerence, 2 m + n − 2 m − 2 n +2 for intersection, 2 m + n − 2 n +1 for diﬀerence, 34 2 m + n for concatenation, and 2 mn for shuﬄe. We use a binary alphabet to describe witnesses for complementation, reversal, star, and left and right quotient, and a quaternary alphabet otherwise. Whenever we use a binary alphabet, it is always optimal.


Introduction
The state complexity of a regular language L, sc(L), is the smallest number of states in any deterministic finite automaton (DFA) recognising L. The state complexity of a k-ary regular operation • is a function from N k to N given by (n 1 , n 2 , . . ., n k ) → max{sc(•(L 1 , L 2 , . . ., L k )) | sc(L i ) ≤ n i for i = 1, 2, . . ., k}.
The first results on the state complexity of basic regular operations have been obtained by Maslov [11], Birget [1], and Yu et al. [15].Holzer and Kutrib [6] considered the representation of regular languages by nondeterministic finite automata (NFAs) and defined and studied the nondeterministic state complexity of regular languages and operations in an analogous way.Jirásek Jr. et al. [8,9] investigated operational state complexity using representation of regular languages by self-verifying and unambiguous finite automata.Notice that in all of the above mentioned cases, the arguments and the results of regular operations are represented by the same computational model.
In this paper, we consider the NFA-to-DFA trade-off for regular operations, that is, we assume that the arguments of an operation are represented by NFAs, while the resulting language is required to be represented by a DFA.Our motivation comes from the following two streams of research.
While investigating operational state complexity on self-verifying or unambiguous automata, which are nondeterministic, the NFA-to-DFA trade-off provides an upper bound on the complexity of the corresponding operation since every DFA is self-verifying as well as unambiguous.As shown in [8,9], these upper bounds are tight for several operations.
Our second motivation comes from the research on the state complexity of combined operations that began with the paper by Salomaa et al. [12].If a combined operation does not contain complementation, we can perform all the included operations using NFAs.Then, the NFA-to-DFA trade-off for the outermost operation can be used to get an upper bound on the desired complexity of a given combined operation.
We examine the NFA-to-DFA trade-off for complementation, intersection, union, difference, symmetric difference, reversal, star, concatenation, shuffle, and left and right quotient.For each of these operations, we get tight upper bound on its NFA-to-DFA trade-off.To describe witnesses, we use either binary or quaternary alphabets.The binary alphabet is always optimal in the sense that the corresponding upper bounds cannot be met by any unary languages.
To conclude this introduction, let us mention that the trade-offs between different models of finite automata have been studied for the forever operator defined as L → (Σ * L c ) c by Birget [2] and Hospodár et al. [7].

Preliminaries
We assume that the reader is familiar with basic notions in formal languages and automata theory.For details and all unexplained notions, the reader may refer to [14].
Let Σ be a finite non-empty alphabet.Then Σ * denotes the set of all words over Σ including the empty word ε.If u, v, w ∈ Σ * and w = uv, then u is prefix of w.Moreover, if u = w, then u is a proper prefix of w.A language over an alphabet Σ is any subset of Σ * .
If K and L are languages over Σ, then the complement of L is L c = Σ * \ L. The intersection, union, difference, and symmetric difference of K and L are defined as for arbitrary sets.Next, we consider the following regular operations: A nondeterministic finite automaton (NFA) is a 5-tuple A = (Q, Σ, •, s, F ), where Q is a finite non-empty set of states, Σ is a finite input alphabet, s ∈ Q is the starting (or initial) state, F ⊆ Q is the set of final (or accepting) states, and • : Q × Σ → 2 Q is the transition function which can be extended to the domain 2 Q × Σ * in the natural way.The language recognised by the NFA A is the set of words An NFA A is a deterministic finite automaton if |q • a| = 1 for each q ∈ Q and a ∈ Σ.In such a case we write q •a = q instead of q •a = {q }, and use q a − → q to denote that q • a = q .Sometimes we permit non-deterministic automata to have more initial states; in such a case we use an abbreviation NNFA from [14].
A subset S of Q is reachable in an NNFA A = (Q, Σ, •, I, F ) if S = I • w for some w ∈ Σ * , and it is co-reachable if it is reachable in the reversed automaton A R obtained from A by reversing all its transitions and swapping the roles of the initial and final states.
Every NNFA A = (Q, Σ, •, I, F ) has an equivalent deterministic finite automaton The following observation provides a sufficient condition that guarantees distinguishability of all states in a subset automaton.We use this lemma throughout the paper.
Lemma 1 (Distinguishability).Let A be an NFA such that for every state q of A the singleton set {q} is co-reachable in A. Then every two distinct states of the subset automaton D(A) are distinguishable.
Proof.Let us take two distinct subsets S and T of D(A).Without loss of generality, let q ∈ S \ T .Since the set {q} is co-reachable in A, there is a word w q that is accepted by A from the state q and rejected from every other state.It follows that in D(A), the word w q is accepted from S and rejected from T .Hence S and T are distinguishable in D(A).
The next lemma shows that every subset of the state set of the NFA from Fig. 1 is reachable in the corresponding subset automaton.It also shows that to reach every non-empty subset, the final state n − 1 may be visited only in the very last steps.This is an important property which is used later the get the results for concatenation.

Complementation, Reversal, Star, and Concatenation
In this section we examine the NFA-to-DFA trade-off for basic unary operations and concatenation.We start with complementation.Its state complexity is n while its non-deterministic state complexity is 2 n [10].The next theorem shows that its NFA-to-DFA trade-off is 2 n as well.
Theorem 3 (Complementation).Let L be a language over Σ recognised by an n-state NFA.Then sc(L c ) ≤ 2 n , and the bound is tight if |Σ| ≥ 2.
Proof.Since sc(L c ) = sc(L), the upper bound follows from the upper bound on determinization.For tightness, let L be the language recognised by the NFA A shown in Fig. 1 on page 3.By Lemma 2, every subset of {0, 1, . . ., n−1} is reachable in the subset automaton D(A).Since every singleton set is co-reachable in A via a word in a * , all states of D(A) are pairwise distinguishable by Lemma 1.
Notice that the binary alphabet used in the previous proof is optimal since every unary n-state NFA can be simulated by a DFA of 2 O( √ n ln n) states as shown by Chrobak [5].Let us continue with the reversal operation.Note that it is enough to take the reversal of any DFA with one final state meeting the upper bound 2 n on the state complexity of reversal.Such a binary DFA was described by Šebej [13].Here we describe a different witness with significantly simpler proof.
Theorem 4 (Reversal).Let L be a language over Σ recognised by an nstate NFA, where n ≥ 2. Then sc(L R ) ≤ 2 n , and the bound is tight if |Σ| ≥ 2.
Proof.Let L be accepted by an n-state NFA A = (Q, Σ, •, s, F ).By reversing all the transitions in A and taking F as the set of starting states and {s} as set of final states we obtain an n-state NNFA that accepts L R .It follows that L R is accepted by a DFA with at most 2 n states.
To prove tightness, consider the binary language L recognised by the nstate NFA N = ({0, 1, . . ., n − 1}, {a, b}, •, 0, {0, 1, . . ., n − 1}) shown in Fig. 2 where i The set of initial state of N R is {0, 1, . . ., n − 1} and its unique final state is 0. Notice that each subset of the state set of N R can be shifted cyclically by one by reading a, and the state 0 can be eliminated from every set containing 0 by reading b.It follows that every subset of {0, 1, . . ., n − 1} can be reached from the initial subset {0, 1, . . ., n − 1} in the subset automaton D(N R ).Next, every set {i} is co-reachable in N R via a word in a * and using Lemma 1 we get that every two distinct states of the D(N R ) are distinguishable.
The binary alphabet used in the previous theorem is optimal for the same reason as in the case of complementation.We continue with the star operation.While its state complexity is 3  4 2 n [11] and its nondeterministic state complexity is n + 1 [10], we show that the NFA-to-DFA trade-off for star is 2 n .Theorem 5 (Star).Let n ≥ 2. Let L be a language over an alphabet Σ recognised by an n-state NFA.Then sc(L * ) ≤ 2 n , and the bound is tight if |Σ| ≥ 2.
Proof.Let L be recognised by an n-state NFA A = (Q, Σ, •, s, F ). Construct an NNFA N recognising L * from A as follows.First, for each transition (p, a, q) in A with q ∈ F , add the transition (p, a, s).Next, if s / ∈ F , then add a new initial and final state q 0 to accept the empty word.Consider the subset automaton D(N ).The only reachable set in D(N ) containing the state q 0 is the initial subset {s, q 0 }.All the remaining reachable sets are subsets of Q.Moreover, if a reachable set contains a final state of A, then it also contains the state s.If A has a final state different from s, then at least 2 n−2 sets are unreachable in D(N ), so the upper bound is 1 + (3/4)2 n in this case.If F = {s}, then the construction above results in the same automaton, so L * = L.In such a case, the upper bound is 2 n .
To prove tightness, consider the binary language L recognised by the nstate NFA A = ({0, 1, . . ., n − 1}, {a, b}, •, 0, {0}) shown in Fig. 3 where for each state i, i In the subset automaton D(A), the empty set is reached from the initial subset {0} by b.The reachability of all non-empty subsets is proved exactly the same way as in the proof of Lemma 2. Since every singleton set is co-reachable in A via a word in a * , all the states of D(A) are pairwise distinguishable by Lemma 1.
The witness from the previous proof is described over a binary alphabet.It is impossible to meet the upper bound 2 n in the unary case since every unary nstate NFA can be simulated be a DFA with 2 O( √ n ln n) states.The unary language recognised by the NFA from Fig. 4 provides a lower bound (n − 1) 2 + 2; notice that this NFA is not unambiguous.We conjecture that this lower bound is tight.Our computations support this conjecture.We conclude this section with the concatenation operation the state complexity of which is m2 n − 2 n−1 [11] and nondeterministic complexity is m + n [6].The next theorem shows that NFA-to-DFA trade-off for concatenation is 3  4 2 m+n , that is, it is exponential in both m and n.
Theorem 6 (Concatenation).Let K and L be non-empty languages over an alphabet Σ recognised by an m-state and n-state NFA, respectively, with m, n ≥ 3. Then sc(KL) ≤ 3 4 2 m+n , and the bound is tight if |Σ| ≥ 4. For each transition (p, a, q) in NFA A with q ∈ F A , add the transition (p, a, s B ).The set of initial states of The following condition holds in the subset automaton D(N ): each reachable subset containing a state from F A must contain the state s B .It follows that at least 2 m+n−2 are unreachable in D(N ), and the upper bound follows.
For tightness, let Σ = {a, b, c, d} and K and L be the languages over Σ recognised by the NFAs A = ({q 0 , q 1 , . . ., q m−1 }, Σ, • A , q 0 , {q m−1 }) and B = ({0, 1, . . ., n − 1}, Σ, • B , 0, {n − 1}) shown in Fig. 5. Notice that transitions on a and b in A are the same as in the NFA in Fig. 1 and perform the identity function in B. The roles of the transitions on c and d in B are the same as the roles of a and b in A. Therefore, it follows from Lemma 2 that for every subset S of {q 0 , q 1 , . . ., q m−1 }, there is a word u S in {a, b} * such that q 0 • A u S = S, and for every subset T of {0, 1, . . ., n − 1}, there is a word v T in {c, d} * such that 0 • B v T = T .Moreover, the words u S satisfy the conditions (1), ( 2), (3) in Lemma 2.
To get an NFA N recognising KL from NFAs A and B, add the transitions (q m−2 , a, 0), (q m−1 , b, 0), and (q m−1 , c, 0).The initial state of N is {q 0 } and its unique final state is n − 1.Let S ⊆ {q 0 , q 1 , . . ., q m−1 } and T ⊆ {0, 1, . . ., n − 1} be two subsets such that if q m−1 ∈ S then 0 ∈ T .The following transitions use the words u S ∈ {a, b} * and v T ∈ {c, d} * given by Lemma 2 to show that the set S ∪ T is reachable in the subset automaton D(N ): let us emphasise that by Lemma 2, while reading u S , the final state q m−1 of A is not visited, except for the last step if q 0 / ∈ S and q m−1 ∈ S and when we must have 0 ∈ T , and for the last two steps if q 0 ∈ S and q m−1 ∈ S when u S ends with b that fixes the initial state 0 of B which must be in T .This proves the reachability of 3  4 2 m+n states in the subset automaton D(N ).To prove distinguishability, notice that each singleton set {j}, 0 ≤ j ≤ n − 1 is co-reachable in N via a word in c * .Next, {q m−1 } is co-reachable by c n , and each {q i }, 0 ≤ i ≤ m − 2 is co-reachable via a word in c n a * .By Lemma 1, all states of D(N ) are pairwise distinguishable.
The witness in the previous proof is defined over a quaternary alphabet.Consider binary languages K = (a + b) * a(a + b) m−2 and L = (a + b) n−1 recognised by an m-state and n-state NFA, respectively.Then KL = (a + b) * a(a + b) m+n−3 , the minimal DFA for which has 2 m+n−4 states.This gives the lower bound (1/16)2 m+n in the binary case which is asymptotically the same as the upper bound for quaternary case.In the unary case the upper bound is 2 O( √ (m+n) ln(m+n)) .A lower bound 1 + F (n − 1) is given by languages K = {ε} and L equal to the witness for determinization in the unary case; here F (n) is Landau's function given by F

Boolean Operations
Here we consider NFA-to-DFA trade-off for four binary Boolean operations.First, we recall some notions.We call a state q of a DFA A = (Q, Σ, •, s, F ) a sink state if q • a = q for every letter a ∈ Σ.The state q is called dead if reading every word from the state q results in a non-accepting state of A.
To get an automaton recognising union, intersection, difference, or symmetric difference of two languages we use the product construction as described below.
and a ∈ Σ, and If the operation inputs are given by NFAs, we first apply the subset construction to get DFAs for those inputs.Then we construct the corresponding product automaton.Notice that every subset automaton has at least one rejecting sink state, namely, the empty set.The following lemma provides upper bounds for Boolean operations on DFAs considering the presence of the rejecting sink states.Lemma 7. Let K and L be languages over Σ accepted by DFAs with m and n states respectively.Assume that both DFAs have a rejecting sink state.Then sc(K ∪ L) ≤ mn, sc(K ⊕ L) ≤ mn, sc(K ∩ L) ≤ mn − m − n + 2, and sc(K \ L) ≤ mn − n + 1.
Proof.For each Boolean operation • ∈ {∪, ∩, \, ⊕}, the language K • L is recognised by the product automaton M • which has mn states.This gives the upper bounds for union and symmetric difference.Let d A and d B be the rejecting sink states of A and B, respectively.Then in the product automaton M ∩ recognizing K ∩ L, the states (d A , q) with q ∈ Q B and the states (p, d B ) with p ∈ Q A are dead and can be merged into one sink state.This gives the upper bound (m − 1)(n − 1) + 1 = mn − m − n + 2. In the product automaton M \ recognising K \L, the states (d A , p) with p ∈ Q B are dead, which gives the upper bound (m − 1)n + 1 = mn − n + 1.Now we are ready to get tight upper bounds on NFA-to-DFA trade-off for Boolean operations.Theorem 8. Let K and L be languages over Σ recognised by an m-state and n-state NFA, respectively, where m, n ≥ 2.
Proof.Let A be an m-state NFA recognising K and B be an n-state NFA recognising L. Consider the corresponding subset automata D(A) and D(B) with 2 m and 2 n states, respectively.Both of them have at least one rejecting sink state, namely, the empty set.Then all upper bounds follow from Lemma 7.
For tightness, let K and L by the languages recognised by NFAs A and B from Fig. 6.Notice that transitions on a and b in A are the same as in the NFA in Fig. 1 It follows that a m−1−s c n is accepted by M ∪ from (S, T ) and rejected from (S , T ).
The case of T = T is symmetric.We can prove distinguishability for symmetric difference in the exact same manner.We can also find the appropriate words that distinguish the desired number of states in the product automaton for intersection and difference.
Notice that the upper bounds in the previous theorem cannot be met by unary languages.The cases of binary and ternary alphabets remain open.5 Shuffle, Left and Right Quotient Here we consider three more binary operations.In all three cases the upper bound constructions are similar to the case when the operation inputs are given by DFAs.Our lower bound for shuffle is the same as the upper bound on its state complexity.The lower bound for left quotient is greater by one than its state complexity.For right quotient, the NFA-to-DFA trade-off is 2 m , while its (nondeterministic) state complexity is m.
Theorem 9 (Shuffle).Let K and L be languages over Σ recognised by an mstate and n-state NFA, respectively, where m, n ≥ 3. Then sc(K ¡ L) ≤ 2 mn , and the bound is tight if |Σ| ≥ 4.
) be an m-state and n-state NFAs recognising the languages K and L, respectively.Then the language K ¡ L is recognised by mn-state NFA where for each (p, q) ∈ Q For tightness, notice that {ε} −1 K = K{ε} −1 = K.Therefore, the upper bound 2 m is met in both cases by L = {ε} and K equal to the binary m-state witness NFA for determinization given by Lemma 2; for distinguishability, notice that each singleton set is co-reachable in this NFA.
The binary alphabet used in the theorem above is optimal since determinization in the unary case is in 2 O( √ n ln n) .

Conclusions
We investigated the NFA-to-DFA trade-off for several regular operations.Our results are summarised in Table 1.The table also displays the size of alphabet used to describe our witnesses.Whenever we used a binary alphabet, it was is always optimal in the sense that the corresponding upper bounds cannot be met by any unary languages.The table also compares our results to the known results on the state complexity and the nondeterministic state complexity of all considered operations [3,6,10,11,15].
Table 2 shows the operational state complexity on languages represented by self-verifying and unambiguous finite automata from [8,9].The NFA-to-DFA trade-off for concatenation, shuffle, left and right quotient is up to one state almost the same as the complexity of these operations on unambiguous automata.The same holds for left quotient on self-verifying automata.

Fig. 2 .
Fig. 2. A binary witness NFA for reversal meeting the upper bound 2 n .

Fig. 3 .
Fig. 3.A binary witness NFA for star meeting the upper bound 2 n .
be NFAs recognising K and L, respectively, with |Q A | = m, |Q B | = n Construct an NNFA N for KL from NFAs A and B as follows.
and perform the identity function in B. The roles of the transitions on c and d in B are the same as the roles of a and b in A. Moreover c and d perform the identity function in A. It follows from Lemma 2 that for every S ⊆ {0, 1, . . ., m − 1}, there is a word u S ∈ {a, b} * such that 0• A u S = S, and for every T ⊆ {0, 1, . . ., n−1}, there is a word v T ∈ {c, d} * such that 0• B v T = T .Let • ∈ {∪, ⊕, ∩, \}.Construct the product automaton M • from DFAs D(A) and D(B).The initial state of M • is ({0}, {0}).Let S ⊆ {0, 1, . . ., m − 1} and T ⊆ {0, 1, . . ., n − 1}.Then the state (S, T ) is reachable in M • from the initial state by the word u S v T .Hence each state of M • is reachable.To prove distinguishability first consider union.Then (S, T ) is final in M ∪ if m − 1 ∈ S or n − 1 ∈ T .Let (S, T ) and (S , T ) be two distinct states of M ∪ .Then S = S or T = T .In the first case, without loss of generality let s ∈ S \ S .Consider the word a m−1−s c n .Notice that (S, T )
It follows that sc(K ¡ L) ≤ 2 mn .The m-state and n-state partial DFAs over {a, b, c, d, f } meeting this upper bound have been described in [4, Proof ofTheorem 1].Notice in that the role of c and d in that proof is to reach the set Q A × Q B in the subset automaton.The same goal can be achieved if we replace c and d by the transitions on letter e defined as follows: 0• A e = {0, 1}, i • A e = {i + 1}, if 1 ≤ i ≤ m − 2, and 0 • B e = {0, 1}, j • B e = {j + 1}, if 1 ≤ j ≤ n − 2.As a result we get a quaternary witness for shuffle.In the unary case the upper bound for shuffle is again 2 O( √ (m+n) ln(m+n)) .Theorem 10 (Left and Right Quotient).Let K and L be languages over an alphabet Σ recognised by an m-state and n-state NFA, respectively, where m, n ≥ 2.Then sc(L −1 K), sc(KL −1 ) ≤ 2 m , and the bounds are tight if |Σ| ≥ 2.Proof.Let A = (Q A , Σ, s A , • A , F A )be an m-state NFA recognising K.The language L −1 K is recognised by the m-state NFA N obtained from A by changing the set of initial states to {s A • A w | w ∈ L}.The language KL −1 is recognised by the m-state NFA N obtained from A by changing the set of final states to {q ∈ Q

Table 2 .
Operational complexity for self-verifying and unambiguous automata.