The Complexity of Languages Resulting from the Concatenation Operation

.


Introduction
Iwama et al. [4] stated the question of whether there always exists a minimal nondeterministic finite automaton (NFA) of n states whose equivalent minimal deterministic finite automaton (DFA) has α states for all integers n and α satisfying n α 2 n .The question was also considered by Iwama et al. [5], and answered positively in [9] for a ternary alphabet.However, in the unary case, the existence of holes, so called "magic numbers", was proved by Geffert [1].The binary case is still open.
The same problem on sub-regular language families was studied by Holzer et al. [2].It turned out that the existence of non-trivial magic numbers is rare, and that the ranges of possible complexities are usually contiguous.One interesting exception was obtained by Čevorová [18].She studied the star operation on unary regular languages, and proved that there are two linear segments of magic numbers in the range from 1 to (n − 1) 2 + 1, that is, of values that cannot be met by the state complexity of the star of a unary language accepted by a minimal n-state DFA.On the other hand, she proved that for the square operation in the unary case no magic numbers exist [19].Another example of the existence of magic numbers for symmetric difference NFAs was presented by Zijl [17], but they could possibly be trivial.
A similar problem for the reversal, star, and concatenation operation was studied in [7,8], where it was shown that for all the three operations the whole range of possible complexities up to known upper bounds can be produced using an exponential alphabet.
The result for reversal and star was improved in [10,14] by showing that a linear alphabet is enough to produce the whole range of complexities.
In this paper we complement these results, and show that a linear alphabet can also be used for the concatenation operation.We prove that for all m, n, and α with 1 ≤ α ≤ f (m, n), where f (m, n) is the state complexity of the concatenation operation, there exist a minimal m-state DFA A and a minimal n-state DFA B, both defined over an alphabet Σ with |Σ| ≤ 2n + 4, such that the minimal DFA for the language L(A)L(B) has exactly α states.
To get this result, we describe three constructions, in which we are able to get m-state and (n + 1)-state DFAs A i , B i for i = 1, 2, 3 from m-state and nstate DFAs A and B, by adding a new state to B, and by adding the transitions on two new symbols.Moreover, if the state complexity of the concatenation of L(A) and L(B) is α, then the state complexity of the concatenation of L(A i ) and L(B i ), i = 1, 2, 3, is 2α, 2α − 1, and α + 1, respectively.As a results, we get a contiguous range of complexities from m + n + 1 up to known upper bound for a linear alphabet.To get complexities from 1 to m + n − 1, we use a known result from [8].We deal with the value m + n separately, and use a binary alphabet here.
The paper is organized as follows.The next section contains some definitions and preliminary results.In Section 3, we recall known results concerning the state complexity of concatenation.In Section 4, we prove that the range of possible complexities for the languages resulting from the concatenation operation is contiguous from 1 up to known upper bound, and we show that a linear alphabet is enough for this.Section 5 contains some concluding remarks.

Preliminaries
In this section we give some basic definitions and preliminary results.For details, the reader may refer to [3,13,15].
Let Σ be a finite alphabet of symbols.Then Σ * denotes the set of strings over Σ including the empty string ε.The length of a string w is denoted by |w|, and the number of occurrences of a symbol a in a string w is denoted by # a (w).A language is any subset of Σ * .The concatenation of languages K and L is the language KL = {uv | u ∈ K and v ∈ L}.The cardinality of a finite set A is denoted by |A|, and its power-set by 2 A .
A nondeterministic finite automaton the natural way, I ⊆ Q is the set of initial states, and F ⊆ Q is the set of final states.The language accepted by A is the set For a symbol a, we say that (p, a, q) is a transition in NFA A if q ∈ p • a, and for a string w, we write p w − → q if q ∈ p • w.We say that (p, a, q) is an in-transition going to state q.
An NFA A is deterministic (DFA) (and complete) if |I| = 1 and |q • a| = 1 for each q in Q and each a in Σ.In such a case, we write q • a = q ′ instead of q • a = {q ′ }.
The state complexity of a regular language L, sc(L), is the smallest number of states in any DFA for L. The state complexity of a binary regular operation • is defined as a function f (m, n) given by [12].The DFA A ′ is called the subset automaton of the NFA A. The subset automaton may not be minimal since some of its states may be unreachable or equivalent to other states.
In the following proposition, we provide a sufficient condition for an NFA, which guarantees that the corresponding subset automaton does not have equivalent states.
Proposition 1.Let N = (Q, Σ, • , I, F ) be an NFA.Assume that for each state q in Q, there is a string w q in Σ * which is accepted by N only from the state q, that is, we have q • w q ∩ F ̸ = ∅, and p • w q ∩ F = ∅ if p ̸ = q.Then the subset automaton of N does not have equivalent states.
Proof.Let S and T be two distinct subsets of the subset automaton.Then, without loss of generality, there is a state q with q ∈ S \ T .Then the string w q is accepted by the subset automaton from the subset S, but it is rejected from T .

⊓ ⊔
To describe string w q accepted by an NFA only from state q, we usually use the next observation.
Proposition 2. Let a string w q be accepted by an NFA N only from state q.If (p, a, q) is the unique in-transition going to state q by symbol a, then the string aw q is accepted by N only from state p.
In what follows, we often need to show how the set of all the reachable subsets in a subset automaton looks like.To do this, the following observation is useful.
Then R is the family of all reachable subsets of DFA D.
Proof.Each set in R is reachable in D by (1).Let S be a reachable subset of D. Then there is a string w in Σ * such that S = I • w.We prove the proposition by induction on |w|.If |w| = 0, then w = ε and S = I • ε = I, which is in R by (2).Now let w = va for a string v and a symbol a.By the induction hypothesis, the set

State Complexity of Concatenation
Consider minimal DFAs A and B. Without loss of generality, we assume that the state set of A is {q 0 , q 1 , . . ., q m−1 } with the initial state q 0 , and the state set of B is {0, 1, . . ., n − 1} with the initial states 0.Moreover, in both A and B, let us denote the transition function by •.This is not confusing since the state sets of A and B are disjoint.First, let us recall the construction of an NFA for the language L(A)L(B).

Construction of NFA for concatenation:
( DFA A and DFA B → NFA N for L(A)L(B) ) Let A = ({q 0 , q 1 , . . ., q m−1 }, Σ, •, q 0 , F A ) and B = ({0, 1, . . ., n − 1}, Σ, •, 0, F B ) be DFAs.Construct NFA N from DFAs A and B as follows: (a) for each symbol a and each state q i with q i • a ∈ F A , add transition (q i , a, 0); (b) the set of initial states of N is {q 0 } if q 0 / ∈ F A , and it is {q 0 , 0} otherwise; (c) the set of final state of N is F B .
In the subset automaton of NFA N constructed as above, each reachable subset is of the form {q i }∪S, where S ⊆ {0, 1, . . ., n−1} since A is deterministic and complete.Moreover, if q i is a final state of A, then 0 ∈ S since N has the transition (q, a, 0) whenever a state q of A goes to a final state q i on a symbol a.It follows that the subset automaton of [11,16].We write this upper bound as (m − 1)2 n + 2 n−1 .The bound is known to be tight if m ≥ 1 and n ≥ 2 [6,11,16].If m ≥ 1 and n = 1, then L = ∅ or L = Σ * , so the tight upper bound in this case is m.Hence we get the following result.

The Range of Possible Complexities
The aim of this section is to show that the whole range of complexities from 1 to f (m, n) for the concatenation operation can be produced using an alphabet that grows linearly with n.
For each state q of NFA N , there exists a string w q in Σ * accepted by N only from state q.Moreover, we have (b) Assume for a contradiction that there is a set S in R and a symbol σ such that S • σ = {q 0 }.Then we must have q m−1 ∈ S by (2).It follows that the initial state 0 of B must be in S since q m−1 is final in A. However then S • σ ⊇ {q 0 , 0 • σ}, a contradiction.
(c) By (4), the NFA N satisfies the condition in Proposition 1. Therefore the subset automaton D of N does not have equivalent states, and we have sc(L(A)L(B)) = |R|.⊓ ⊔ Now our goal is to construct a minimal m-state DFA A i and a minimal (n+1)state DFA B i for i = 1, 2, 3 over the alphabet Σ ∪ {a n , b n } from automata A and B, such that A, B, N, D, R satisfy conditions (1)-( 4), in such a way that A i and B i , the NFA N i for L(A i )L(B i ), the subset automaton D i of N i and the family R i of reachable states of D i satisfy conditions (1)-( 4).Moreover, if R = α, then We construct automata A i and B i from automata A and B by adding a new state n to DFA B, and by adding the transitions on two new symbols a n and b n .The transitions on a n are the same in all the three constructions, and they guarantee that the string a n is accepted by N i only from state n.The transitions on b n are used to reach the set {q 0 , n} in D 1 , the set {q 1 , n} in D 2 and the set {q m−1 , 0, n} in D 3 .We have to be careful with condition (4), especially in the third construction.1)-( 4).Let A i , B i for i = 1, 2, 3 be the DFAs resulting from Constructions 1, 2, 3, respectively.Let N i be an NFA for L(A i )L(B i ) constructed as described in Section 3, D i be the corresponding subset automaton, and R i be the family of all the reachable subsets in DFA D i .Then all these automata satisfy conditions ( 1)-( 4).Moreover, if Proof.Since we do not change transitions on symbols in Σ on states of A and B, condition (1) is satisfied.Since the only new transition to q 0 is ( In each N i , the string a n is accepted only from state n.Moreover, in B 1 and B 2 , state n goes to itself on each symbol in Σ.It follows that condition ( 4 so S can be reached from the initial state {q 0 } by a string u S over Σ.If moreover, S ̸ = {q 0 }, then, by (3), S is reached from {q 1 } by a string v S . In and every new set S ∪ {n} can be reached from {q 1 }.Let us show that no other set is reachable in D 1 .For each set S in R and each σ in Σ, we have Our first aim is to show that each value in the range from m+n+1 to f (m, n) may be attained by the state complexity of concatenation of m-state and n-state DFA languages provided that m ≥ 3. We show this by induction, with the basis proved in the next lemma.First let i = 1, so α = (m − 2) + 6 = m + 4. Define a minimal m-state DFA A 1,0 = ({q 0 , q 1 , . . ., q m−1 }, {a, b, c, d}, •, q 0 , {q m−1 }) where for each i in {0, 1, . . ., m − 1}, Thus the subset automaton has m + 4 reachable subsets.Next, notice that each of these m + 4 subsets goes to some of them by each symbol in {a, b, c, d}.By Proposition 3, no other set is reachable, so the complexity of L(A 1,0 )L(B 1,0 ) is m + 4. Notice that all the possible subsets containing states q m−1 and q m−2 are reachable in D 1,0 .Now we construct appropriate DFAs from automata A 1,0 and B 1,0 by adding transitions on new symbols.Thus we do not change the transitions on symbols a, b, c, d, and therefore the conditions (1) and ( 4) are always satisfied.Moreover, for each new symbol, the new transition is defined in such a way that condition (2) is satisfied as well.Finally, notice that {q m−1 , 0} is reachable from {q 1 } by a m−2 in the subset automaton D 1,0 .In what follows, we always reach new subsets in the corresponding subset automata for concatenation from the subset {q m−1 , 0}.Hence condition ( 3) is always satisfied.
Next, let α = 2(m − 2) + 6. Construct DFAs A 2,0 , B 2,0 from DFAs A 1,0 , B 1,0 by adding the transitions on a new symbol e 0 as follows: q m−1 • e 0 = q 0 and q i • e 0 = q m−1 for i = 0, 1, . . ., m − 2; 0 • e 0 = 0 and 1 • e 0 = 0. Construct the NFA N 2,0 for L(A 2,0 )L(B 2,0 ).In the subset automaton D 2,0 , all the sets that were reachable in the subset automaton D 1,0 are reachable as well, since the transitions on the old symbols a, b, c, d are the same.For the same reason, the NFA N 2,0 satisfies (4), and therefore the subset automaton D 2,0 does not have equivalent states.Next, in D 2,0 , we have No other new set is reachable since each set {q i , 0} goes either to a set {q j , 0} or to a set containing q m−2 or q m−1 by each symbol in {a, b, c, d, e 0 }, and moreover, by e 0 , each set goes either to {q 0 , 0} or to a set containing q m−1 .Therefore the resulting complexity of the concatenation L(A 2,0 )L(B 2,0 ) is 2(m − 2) + 6.
In a similar way, we construct DFAs A 3,0 , B 3,0 from A 2,0 , B 2,0 by adding transitions on a new symbol e 01 defined as follows: q m−1 • e 01 = q 0 and q i • e 01 = q m−1 for i = 0, 1, . . ., m − 2; 0 • e 01 = 0 and 1 • e 01 = 1.This results in the reachability of m − 2 new subsets {q i , 0, 1} in the subset automaton of N 3,0 .Since no other new set is reachable, the complexity of Finally, construct DFAs A 4,0 , B 4,0 from A 3,0 , B 3,0 by adding the transitions on a new symbol e 1 defined as q m−1 • e 1 = q 0 and q i • e 1 = q m−1 for i = 0, 1, . . ., m − 2; 0 • e 1 = 1 and 1 • e 1 = 1.This results in the reachability of subsets {q i , 1} in the subset automaton of N 4,0 , and the complexity of Up to now we have defined appropriate automata A i,0 and B i,0 for the values α = i(m − 2) + 6 for i = 1, 2, 3, 4. Now let us consider an intermediate value α = i(m − 2) + 6 + j where 1 ≤ i ≤ 3 and 1 ≤ j ≤ m − 3. Construct DFAs A i,j and B i,j from automata A i,0 and B i,0 by adding the transitions on a new symbol f 1 as follows: This results in the reachability of the following j new subsets in the subset automaton of N i,j : Recall that the subset automaton of N i,0 has i(m − 2) + 6 reachable states, and since i ≤ 3, the subsets {q i , 1} are unreachable in the subset automaton of N i,0 .Hence the resulting complexity of of L(A i,j )L(B i,j ) is i(m − 2) + 6 + j as desired.Moreover, all the automata satisfy conditions (1)-( 4).
Finally notice that if A and B are DFAs over a, b, c shown in Fig. 1 then sc(L(A)L(B)) = m + 3.This concludes our proof.
⊓ ⊔ Now we are ready to prove the main lemma.Recall that the state complexity of concatenation is there exist a minimal m-state DFA A, and a minimal n-state DFA B, both defined over an alphabet Σ with |Σ| ≤ 2n + 4, such that sc(L(A)L(B)) = α.
Proof.We prove the claim by induction on n.Moreover, in the induction hypothesis, we assume that DFAs A and B, the corresponding NFA N for L(A)L(B) constructed as in Section 3, the subset automaton D of N , and the set R of reachable states of D satisfy conditions ( 1)-( 4) on page 5.
The basis, in which we have m ≥ 3, n = 2, and m+3 and assume that for each β with m + n + 1 ≤ β ≤ f (m, n), there exist a minimal m-state DFA A and a minimal n-state DFA B, both defined over an alphabet Σ with |Σ| ≤ 2n + 4, such that sc(L(A)L(B)) = β.Moreover, assume that DFAs A and B, the NFA N for L(A)L(B), the subset automaton D of N , and the set of reachable states R of D satisfy conditions (1)-(4) on page 5. Let us show that the claim holds for n + 1.To this aim let α be an integer with m + (n + 1) + 1 ≤ α ≤ f (m, n + 1).
First, let 2m + 2n + 2 ≤ α ≤ f (m, n + 1) and α be even.Let β = α/2.Then m+n+1 ≤ β ≤ f (m, n), and by the induction hypothesis, there exists a minimal m-state DFA A and a minimal n-state DFA B, both defined over an alphabet Σ with |Σ| ≤ 2n + 4, such that sc(L(A)L(B)) = β.Moreover, conditions (1)-( 4) are satisfied for A, B, N, D, R. We use Construction 1, in which we add a new state to DFA B and the transitions on two new symbols to get a minimal m-state A 1 and a minimal (n + 1)-state DFA B 1 .By Lemma 6, all conditions (1)-( 4) are satisfied for and we use the induction hypothesis and our Construction 2, to get automata A 2 and B 2 over We use the induction hypothesis and Construction 3, get appropriate automata A 3 and B 3 , satisfying (1)-( 4) such that sc(L(A 3 )L(B 3 )) = β + 1 = α.Our proof is complete.
⊓ ⊔ Now we consider the case of m = 2 and n ≥ 2. In such a case, we only need to modify conditions (1)-( 4).All the proofs are the same as above, except for the base case, which is a bit more complicated in this case.Proof.We modify conditions ( 1)-( 4) as follows.
') For each state q of NFA N , there exists a string w q in Σ * which is accepted by N only from state q.Moreover, we have Now we continue with exactly the same constructions as in the case of m ≥ 3, and, using induction on n, we get the lemma.

⊓ ⊔
The case of m = 1 and n ≥ 3 is slightly different, although, the main idea is the same.

Proof (Proof Idea).
Let A be a 1-state DFA accepting Σ * .We prove the lemma again by induction on n, where we assume that the following conditions hold for DFA B, the NFA N for Σ * B, constructed from B by adding a loop in the initial state 0 on each input symbol in Σ, for the subset automaton D of N , and the set R of reachable subsets in D: (1") In DFA B, the transitions on a, b, c in states 0, 1, 2 are as in Fig. 2.
(2") In DFA B, we have 0 • σ ̸ = 0 for each σ ∈ Σ. (3") Each subset in R, except for the initial subset {0}, can be reached from the subset {0, 1}.(4") NFA N satisfies the condition in Proposition 1, that is, for each state j of N , there exists a string w j in Σ * which is accepted by N only from state j.Moreover, we have w 0 = c and w 1 = ε.The basis, in which we have n = 3 and n + 1 = f (1, 3) = 4, holds true since the 3-state DFA B shown in Fig. 2 satisfies (1")-(4").
For the induction step, we again describe three constructions: We construct (n + 1)-state DFAs B 1 , B We can show that all the resulting automata satisfy conditions (1")-(4"), and, moreover, if This proves the lemma by induction.
⊓ ⊔ Up to now we have considered the complexities in the range from m + n + 1 to f (m, n).The complexities from 1 to m + n − 1 are covered by the following result from [8].Notice that this lemma also covers the case of m = 1 and n = 2, since then The next lemma shows that the complexity m + n can be produced.Then we consider the case of n = 1.Lemma 12. Let m ≥ 2, n ≥ 2. There exist binary regular languages K and L with sc(K) = m and sc(L) = n such that sc(KL) = m + n.Proof.Let K and L be the binary languages accepted by minimal DFAs A and B shown in Fig. 3, where for each i in {0, 1, . . ., m − 1} and j in {0, 1, . . ., n − 1}, we have • a = 0, and j • b = n − 1. Construct an NFA N from DFAs A and B by adding transitions (q m−2 , a, 0), (q m−1 , a, 0), and (q i , b, 0) for each i; the initial state of N is q 0 , and the set of final states is {n − 1}.In the corresponding subset automaton, the initial subset is {q 0 }, and we have − → {q m−1 , 0, 1, . . ., j} for j = 1, 2, . . ., n − 1, and It follows that the subset automaton has m + n reachable subsets.Notice that each of these m + n subsets goes to some of them by a, and each of them goes to {q m−1 , 0} or to {q m−1 , 0, n − 1} by b.By Proposition 3, no other set is reachable.
To prove distinguishability, let {q i }∪S and {q j }∪T be two distinct reachable subsets.Since NFA N accepts the string a n−1−t only from state t (0 ≤ t ≤ n−1), the two subsets are distinguishable if S ̸ = T .Next, if i < j, then we have where the resulting subsets are distinguishable since they differ in a state of B. This proves distinguishability and concludes the proof.The next theorem summarizes our results, and shows that the whole range of complexities for the concatenation operation can be produced using an alphabet which grows linearly with n.Recall that f (m, n) is the state complexity of the concatenation operation on languages over an alphabet of size at least two and we have f (m, 1) = m and f (m, n) = (m − 1)2 n + 2 n−1 if n ≥ 2. Theorem 14.Let m, n ≥ 1.For each α with 1 ≤ α ≤ f (m, n), there exist regular languages K and L defined over an alphabet Σ with |Σ| ≤ 2n + 4 such that sc(K) = m, sc(L) = n, and sc(KL) = α.
Proof.In each of the following six cases, we refer to the corresponding lemma dealing with this case: This covers all the possible cases, and proves the theorem.⊓ ⊔

Conclusions
We investigated the state complexity of languages resulting from the concatenation operation.We proved that for all m, n, α with m, n ≥ 1 and 1 ≤ α ≤ f (m, n), where f (m, n) is the state complexity of the concatenation operation, there exist regular languages K and L defined over an alphabet of size at most 2n + 4 such that sc(K) = m, sc(L) = n, and sc(KL) = α.This improves the result from [8], where an alphabet of size growing exponentially with n is used to produce the whole range of complexities for the concatenation operation.Our result complements similar results from [10,14], where a linear alphabet is used to get the whole range of complexities for the reversal and Kleene closure operations.
A similar problem for the square operation, defined as L 2 = LL, remains open even for an exponential alphabet.

2 , 1 . 5 .
. . ., m − 2, and w j = a j for j = 2, 3, . . ., n − Proposition Let A, B, N , D, and R satisfy conditions (1)-(4).Then (a) The sets {q 1 }, {q m−1 , 0}, {q m−1 , 0, 1}, {q m−2 , 0, 1} are in R. (b) The initial subset {q 0 } of the subset automaton D cannot be reached from any other reachable subset of D. (c) The subset automaton D of NFA N does not have equivalent states, so sc(L(A)L(B)) = |R|.Proof.(a) By (1), the transitions on a, b, c are as in Fig. 1.It follows that in the subset automaton D, we have ) is satisfied for N 1 and N 2 .In B 3 , we have n • c = 0 and n • b = 0 • b = b.It follows that (0, c, 1) is the only transition on c going to state 1, and (q m−1 , b, 0) is the only transition on b going to state 0. It follows that (4) is satisfied for N 3 as well.Now consider the subset automata D 1 , D 2 , D 3 .Since we did not change transitions on symbols in Σ on states in A and B, we have , and (S ∪ {n}) • b n ∈ {{q 0 , n}, {q m−1 , 0, n}}.Using Proposition 5(a), we get that all the resulting sets are

Lemma 7 .
Let m ≥ 3 and n = 2.For each α with m + 3 ≤ α ≤ f (m, 2) = 4m − 2, there exist a minimal m-state DFA A and a minimal 2-state DFA B, both defined over an alphabet Σ with |Σ| ≤ 7, such that sc(L(A)L(B)) = α.Moreover, the corresponding NFA N for L(A)L(B), the subset automaton D of N , and the set R of reachable states of D satisfy conditions (1)-(4) on page 5.Proof.We first consider the values α = i(m − 2) + 6 for i = 1, 2, 3, 4. Then we consider all the intermediate values of α.Finally we deal with the case α = m+3.

Lemma 11 ([ 8 ,
Lemma 5]).Let m, n ≥ 1.For each α with 1 ≤ α ≤ m+n−1, there exist a minimal m-state DFA A and a minimal n-state DFA B, both defined over an alphabet of at most two symbols, such that sc(L(A)L(B)) = α.

wn an anc an Table 2 .Fig. 3 .
Fig. 3.The minimal DFAs A and B with sc(L(A)L(B)) = m + n.

Table 1 .
n New Construct DFAs A 2 and B 2 from DFAs A and B as follows: (1) add a new state n to DFA B going to itself on each old symbol σ in Σ; (2) add the transitions on two new symbols a n and b n as shown in Table 1 in column C2. (2) add the transitions on two new symbols a n and b n as shown in Table 1 in column C3.Let A, B, N, D, R satisfy conditions ( transitions; i ∈ {0, 1, ..., m − 1}, j ∈ {0, 1, ..., n − 1}.Construction 1. (α → 2α)Construct DFAs A 1 and B 1 from DFAs A and B as follows:(1) add a new state n to DFA B going to itself on each old symbol σ in Σ;(2) add the transitions on two new symbols a n and b n as shown in Table1in column C1. 3 and B 3 from DFAs A and B as follows: (1) add a new state n to DFA B with n • c = 1 and n • σ = 0 • σ if σ ∈ Σ \ {c}; 2 , B 3 from DFA B by adding a new state n, and by adding transitions on new symbol a n , b n , as shown in Table 2 in columns C1, C2, and C3, respectively.