Descriptional Complexity and Operations – Two Non-classical Cases

. For a language family L , a syntactic complexity measure K deﬁned on languages of L , a number n ≥ 1, and an n -ary operation ◦ under which L is closed, we deﬁne g K ◦ ( m 1 , m 2 , . . . , m n ) as the set of all integers r such that there are n languages L i , 1 ≤ i ≤ n , with K ( L i ) = m i for 1 ≤ i ≤ n and K ( ◦ ( L 1 , L 2 , . . . , L n )) = r. In this paper we study these sets for the operation union, catenation, star, complement, set-subtraction, and intersection and the measure number of accepting states (deﬁned for regular languages) as well as for reversal, union, catenation, and star and the measures number of nonterminals, productions, and symbols (deﬁned for context-free languages). Moreover, we discuss the change of these sets if one restricts to ﬁnite languages, unary languages, and ﬁnite unary languages.


Introduction
The state complexity sc(L) of a regular language L is defined as the minimal number of states that are sufficient and necessary for a deterministic finite automaton to accept L. The study of the state complexity of regular languages is a central topic in theoretical computer science, but it has also large importance in applied fields. The first important results by Lupanov, Moore, Meyer, Fischer and others date back to the sixties and seventies, i. e., to the beginning of theoretical computer science.
In the last three decades the following problem was intensively investigated: Given a binary regularity preserving operation • and two numbers m and n, determine the maximal number k (denoted by f sc • (m, n)) such that there are languages L m and L n with sc(L m ) = m, sc(L n ) = n and sc(L m • L n ) = k (we have introduced the concept for binary operations, but the concept can be used for unary, ternary etc. operations as well). Summaries on the study of f sc • can be found in the papers [29] and [12].
As for other problems concerning the state complexity, one has noticed that the behaviour of the complexity under operations can considerably change if one restricts to finite or unary or finite unary languages. This can be seen from the following We mention that there are also many papers where other subfamilies of the family of regular languages have been studied, e. g., star-free languages, unionfree languages, languages closed under certain subword operations. Examples for such results can be found in [16] and [22].
There are also some results where instead of the maximal number f sc • the set g sc • of all numbers k such that there are languages L m and L n with sc(L m ) = m, sc(L n ) = n, and sc(L m • L n ) = k is determined. We mention here three such results.
Obviously, the problem of the behaviour of syntactic measures of complexity under operations can be discussed in other cases, too, where one can change the complexity measure and/or the considered type of automaton. We mention here the following approaches.
The most natural extension of deterministic finite automata are nondeterministic finite automata. The operational behaviour of the (nondeterministic) state complexity is studied in [18] and summarized in [19].
The number of transitions is not of interest for complete deterministic finite automata, but the situation changes if one allows incomplete finite automata. In the papers [13] and [23], one can find results on the behaviour of the number of transitions under operations.
In order to cover XML structures one has to extend usual finite automata and comes to nested word automata or visibly pushdown automata or input-driven pushdown automata. The state complexity of these automata under operations is studied in the papers [1], [25], and [24].
Another natural extension of finite automata over strings is given by tree automata. There are also some results for their operational state complexity (see [26] and [27]).
In this paper we summarize results on the behaviour of some further nonclassical measures of descriptional complexity under operations.
First we consider the number of accepting states instead of the number of (all) states. For a regular language L, the number asc(L) is defined as the minimal number of accepting states that are sufficient and necessary for a deterministic finite automaton to accept L. We mention two points of interest in this measure: -It was shown that, for two languages L m and L n with sc(L m ) = m and sc(L n ) = n, the relation sc(L m · L n ) ≤ m2 n − asc(L m ) · 2 n−1 holds for m ≥ 2 and n ≥ 1, and that the bound is optimal. For the Kleene-closure and the cut-operation, the complexity of the resulting language also depends on the number of accepting states (see [12] and [9]). -The complexity of algorithms for the minimization of Büchi automata and for model checking based on Büchi automata depend on the number of accepting states of the Büchi automaton (see [5] and [10]). We determine the sets g asc • for union, complement, set-subtraction, Kleene star, catenation, and intersection. Furthermore, we discuss variants of g asc • where we restrict the sets L m and L n to be finite sets or regular unary sets and or finite unary sets. The comparison shows that, for most operations, the situation is similar for arbitrary regular, finite sets, and regular unary sets, whereas finite unary sets show a completely different behaviour.
Some of these results were already published in [6].
Furthermore, we consider context-free languages and define their syntactic complexity as the minimal number of nonterminals or productions or symbols which is necessary to generate the language by context-free grammars. These measures were introduced and studied by Gruska in [14] and [15]. We summarize some results obtained in cooperation with Ralf Stiebe and Ronny Harbich (see [7], [8], and [17]) for the operational behaviour of these measures for arbitrary context-free languages under reversal, union, catenation, and star. Moreover, we add results for the case of finite, unary context-free, and finite unary languages. For the number of variables, there is a large difference which comes from the fact that the complexity for finite and unary context-free languages is bounded by one or two. For the number of productions, it seems that the difference between arbitrary and finite sets is essentially that we miss one or two "large" values.

Definitions and Notations
We assume that the reader is familiar with the basic notions of theory of automata and formal languages; for details we refer to [28]. Essentially, we give some notations and define the complexity measures of regular and context-free languages which are considered in this paper.
By card(M ), we denote the cardinality of a set M . The empty word is denoted by λ. By N, we denote the set of all positive integers. If L is a language, then we define the complement C(L) of L as the set of all words w ∈ V * which are not contained in L, where V is the minimal set (with respect to inclusion) with L ⊆ V * .
We specify a (deterministic) finite automaton as a tuple A = (Q, X, q 0 , F, δ) where Q and X are finite sets of states and inputs, respectively, q 0 ∈ Q is a distinguished state (the initial state), F is a subset of Q (the set of accepting states), and δ is a function from Q × X into Q. The language accepted by A is denoted by L(A).
For a finite automaton A = (Q, X, q 0 , F, δ) and a regular language L, we set A context-free grammar is specified as a quadruple G = (N, T, P, S) where N and T are two finite and disjoint sets, P is a finite subset of N × (v ∪ N ) * , and S is a distinguished element of N . The elements of N , T , and P are called nonterminals, terminals, and productions, respectively, and S is called the axiom. We write A → w instead of (A, w) ∈ P . By L(G), we denote the language generated by G.
For a context-free grammar G = (N, T, P, S), we set Prod(G) = card(P ), and Let K ∈ {Var, Prod, Symb}. For a context-free language L, we set For a language family L, a syntactic complexity measure K defined on languages of L, a number n ≥ 1, and an n-ary operation • under which L is closed, we define g K • (m 1 , m 2 , . . . , m n ) as the set of all integers r such that there are n languages L i , 1 ≤ i ≤ n, with K(L i ) = m i for 1 ≤ i ≤ n and K(•(L 1 , L 2 , . . . , L n )) = r.
If we additional require that the languages L i , 1 ≤ i ≤ n, are finite or unary or finite unary, we use the notations g K,f

Number of Accepting States
In this section we only consider regular languages; therefore we omit the adjective "regular".
With respect to the measure number of accepting states, we consider the operations complement, union, set-substraction, catenation, star, and intersection.
We start with complement.
Theorem 1. The following relations hold for the operation complement: We see that the only difference is that 0 is not in g asc,f C (1) and not in g asc,f,u C (1). This difference comes from the following observations: asc(∅) = 0 holds, and asc(L) ≥ 1 iff L is not empty.
-the complement of the empty set is the non-finite set V * of all words (and asc(V * ) = 1). We now give the results for union, set-subtraction, and star and compare them afterwards.
Theorem 2. The following relations hold for the operation union: Theorem 3. The following relations hold for the operation set-subtraction: for  First we mention that, for all these three operations union, set-subtraction, and star, there is no difference between the situations for arbitrary sets and arbitrary unary sets.
Moreover, for all these three operations, a difference between allowing arbitrary sets and restricting to finite sets only occurs for m = 1. In all cases, essentially, it comes from the following lemma. Lemma 1. Let L be a finite language. Then asc(L) = 1 if and only if L is prefix-free (i. e., no prefix of w ∈ L, which is different from w, is in L).
If we now assume that asc(L ∪ L ) = 1, then L m ∪ L n is prefix-free, and consequently L m and L n are prefix-free, too, which gives asc(L) = asc(L ) = 1.
Moreover, if asc(L) = 1 and thus L is prefix-free, we get that L \ L is prefix-free for all languages L . Therefore, we get However, we see that there are differences between the situations for finite sets and finite unary sets at one hand and arbitrary unary sets and finite unary sets at the other hand. These differences come from the fact that, for a unary finite set L with n elements, we have asc(L) = n, from which the statements for finite unary sets follow.
We now turn to catenation where we have no results -except for some trivial cases -for unary sets.
Theorem 5. The following relations hold for the operation catenation: Note that we do not know whether there is a difference for arbitrary and finite sets, since g asc,f . is not completely determined.
For all the preceding operations •, g asc • (m, n) was almost the set of all positive integers. This changes completely, if we consider intersection. Let L and L be two regular sets accepted by the finite automata A and A , respectively. Then the standard construction of an automaton B accepting L ∩ L gives an upper bound asc(A) · asc(A ) for asc(L ∩ L ). Hence g asc ∩ (m, n) contains only numbers ≤ m · n. Theorem 6. The following relation holds for the operation intersection: For m ≥ 0 and n ≥ 0, We mention some easy consequences: -g asc ∩ (0, n) = g asc ∩ (m, 0) = {0} for m ≥ 0 and n ≥ 0, -m · n ∈ g asc ∩ (m, n) for m ≥ 0 and n ≥ 0, -for m ≥ 0, we have g asc ∩ (m, m) ⊇ {r | 0 ≤ r ≤ 4m 9 }, i. e., a large section of small numbers is in g asc ∩ (m, m). For the unary case and intersection, we have the following statements: Theorem 7. The following relation holds for the operation intersection: For m ≥ 0 and n ≥ 0, and g asc,f,u ∩ (m, n) = {0, 1, . . . , min{m, n}}.
By the proof of (1), one can give an extension for m = n and numbers k and l with 0 ≤ k ≤ m and 0 ≤ l ≤ n, but one has to use in the formulation min{m, n} and min{k, l} which makes the formulae a little bit hard to read. We have no useful result for g asc,f ∩ (m, n).
We have seen above that there are differences if we considered in general case or the unary case. Thus it is a natural question if there are further differences if we restrict the size of the underlying alphabet, i. e., if we consider the binary, ternary etc. case. We mention that all results presented above for arbitrary regular languages and finite languages already hold for alphabets with at least two letters. Therefore it is not necessary to distinguish by the size of the alphabet.

Syntactic Complexity Measures for Context-Free Languages
In this section we only consider context-free languages; therefore we omit the adjective "context-free. We start with the remark that, Var(L) ≤ 1 for any finite language L (the set consisting of the words w 1 , w 2 , . . . , w n is generated by a grammar with the rules S → w 1 , S → w 2 , . . . S → w n ) and Var(L) ≤ 2 for any unary contextfree language (L = {a n1 , a n2 , . . . , a ns } ∪ a p {a m1 , a m2 , . . . , a mt } is generated by a grammar with the rules S → a n1 , S → a n2 , . . . , S → a ns , S → S , S → a p S , S → a m2 S → a m1 , . . . , S → a mt ). Thus, the sets g Var,f • (m, n) are not defined if m ≥ 2 or n ≥ 2, and the sets g Var,u • (m, n) are not defined if m ≥ 3 or n ≥ 3. We now present the results for the reversal operation.
From commutativity of union and Theorem 8, it follows that, for the measures K ∈ {Var, Prod, Symb}, and the corresponding relations also hold if we restrict to finite languages, unary languages, and finite unary languages. Therefore we can assume without loss of generality that m ≥ n, if we discuss union or product. We have the following results concerning operations and the number of variables.   Figure 3.
We see that there is a large difference between arbitrary regular languages on the one side and finite or unary sets on the other hand, but the difference only originates from the restricted domain of g Var,f • and g Var,u • . Between finite and finite unary sets, there is no difference.
We now consider the number of productions.
i) The number 1 and all numbers k with k > m+n+2 are not in g Prod The essential differences between the cases of arbitrary, finite, and unary sets are: -If we restrict to finite sets, then m+n+1 and m+n+2 are not in g P rod,f ∪ (m, n) and not in g P rod,f,u ∪ (m, n).
-If we restrict to the unary case we only know that the numbers k ≥ min{m, n} are in g P rod,u ∪ (m, n) and g P rod,f,u ∪ (m, n); for "small" numbers we miss constructions. For the star operation, we have the following results. We see that the only difference between arbitrary context-free and finite sets is that n + 2 is not contained in g Prod,f * (n). We do not have non-trivial results for g Prod,u * and g Prod,f,u * . We now discuss the concatenation and restrict to the general case, because all our proofs require infinite languages and languages over an alphabet with at least two letters (i. e., we cannot present results on g Prod,f · , g Prod,u · , and g Prod,f,u · ).
Theorem 17. i) For all numbers m ≥ n ≥ 1, the number 0 and all numbers k with k ≥ m + n + 1 are not in g Prod With respect to the number of symbols the situation is not very clear for small numbers n and m. We only give the results for "large" numbers; for a proof and further facts we refer to [17].

Conclusion
We have presented a summary of result concerning the operational complexity of the number of accepting states for regular languages and of the number of nonterminals, productions, and symbols for context-free languages.
For the number of accepting states and the operations union, set-subtraction, complement, and star, the results are complete, and we see that there are -essentially -no differences between arbitrary, finite, and unary sets. However, the finite unary sets behave completely differently. With respect to catenation, the situation can be the same, but, for a proof, a complete determination of g asc,f · is necessary (and missing at present). For the intersection, we have not enough information in order to make a statement on the comparison.
The situation is different for the syntactic measures of context-free languages. Concerning the number of nonterminals, we have a difference between arbitrary context-free languages and the restricted versions, but it comes from the very limited domain (of the variables m and n in the case of restrictions). If we restrict to "large" arguments (say m ≥ n ≥ 50, which can be justified by practical reasons), we have a good situation for the number of productions with respect to union and star since the difference between arbitrary context-free sets and finite sets is only in two or one "large" values. For the comparison of unary and finite unary languages, we need more information.
In order to get a more complete picture, it is necessary to determine the sets g Prod,f · , g Prod,u · , and all sets with a restriction and the measure Symb. Finally, we mention that it remains open to determine the sets g asc • and their restricted versions for further operations as reversal (L R ) or squaring (L 2 ), quotients, etc. as well as the sets g K • with K ∈ {Var, Prod, Symb} for operations as squaring, quotients, etc.