Branching Measures and Nearly Acyclic NFAs

. To get a more comprehensive understanding of the branching complexity of nondeterministic ﬁnite automata (NFA), we introduce and study the string path width and depth path width measures. The string path width on a string w counts the number of all complete computations on w , and the depth path width on an integer (cid:96) counts the number of complete computations on all strings of length (cid:96) . We give an algorithm to decide the ﬁniteness of the depth path width of an NFA. Deciding ﬁnite-ness of string path width can be reduced to the corresponding question on ambiguity. An NFA is nearly acyclic if any computation can pass through at most one cycle. The class of nearly acyclic NFAs consists of exactly all NFAs with ﬁnite depth path width. Using this characterization we show that the ﬁnite depth path width of an m -state NFA over a k -letter alphabet is at most ( k + 1) m − 1 and that this bound is tight. The nearly acyclic NFAs recognize exactly the class of constant density regular languages.


Introduction
Finite automata are a fundamental model of computation that has been extensively studied since the 1950s. The last decades have seen much work on the descriptional complexity, or state complexity, of regular languages [8,9,25].
The degree of ambiguity of a nondeterministic finite automaton (NFA) A on a string w is the number of accepting computations of A on w. Ravikumar and Ibarra [19] have first studied systematically the size-trade-offs between NFAs of different degrees of ambiguity. Leung [15] has shown that general NFAs can be exponentially more succinct than polynomially ambiguous NFAs, and Hromkovič and Schnitger [11] have established a descriptional complexity separation between polynomially ambiguous and finitely ambiguous NFAs.
The degree of ambiguity is defined in terms of the number of accepting computations, and does not directly limit the total amount of nondeterminism in a computation. The computation of an unambiguous NFA may include an unbounded number of nondeterministic steps, as long as at each nondeterministic step, only one choice can lead to acceptance. The tree width 1 (a.k.a. leaf size) measure counts the number of leaves of the computation tree [10,17,18]. Other measures of nondeterminism for finite automata have also been considered [6-8, 10, 18].
We study a measure called string path width that counts the number of complete accepting and non-accepting computations of an NFA on a given string. The string path width can be viewed as a blending between the tree width measure and the degree of ambiguity. For certain NFAs, the string path width is the same as tree width, and for others the same as ambiguity. In fact, Goldstine et al. [6] have defined 'ambiguity' as the number of complete computations, which coincides with our notion of string path width. The degree automata [13] extend these notions by considering the ratio of the number accepting computations and the number of all computations on a given string.
To get a more comprehensive understanding of the degree of branching 2 of an NFA, we introduce the depth path width measure, which counts the total number of complete computations on all inputs of a given length. We establish necessary and sufficient conditions for an NFA to have infinite depth path width. These conditions are based on the existence of cycles satisfying certain requirements. This characterization yields a polynomial time algorithm to decide whether or not the depth path width of an NFA is bounded. Finiteness of string path width can be decided with existing algorithms from the literature [24].
It is well known that acyclic finite automata characterize exactly the finite languages. We characterize regular languages having bounded depth path width by an extension of acyclic NFAs, called nearly acyclic NFAs. An NFA A is said to be nearly acyclic if A, roughly speaking, it does not contain two distinct cycles where a state of one cycle is reachable from the other cycle.
We show that there exists an m-state nearly acyclic NFA over a k-letter alphabet having depth path width (k + 1) m−1 , and that this is an upper bound for all m-state NFAs over a k-letter alphabet having finite depth path width. Finally, we show that nearly acyclic NFAs recognize exactly the regular languages of bounded density [21]. For nearly acyclic DFAs we have a stronger correspondence: any DFA recognizing a bounded density language must be nearly acyclic.

Preliminaries
Here we recall and introduce some notation and definitions. More information on finite automata can be found e.g. in [22,25]. The set of strings over a finite alphabet Σ is Σ * , and ε is the empty string. The cardinality of a finite set F is denoted |F | and N is the set of non-negative integers.
A nondeterministic finite automaton (NFA) is a tuple A = (Q, Σ, δ, q 0 , F ) where Q is the finite set of states, Σ is the input alphabet, δ : Q × Σ → 2 Q is the transition function, q 0 ∈ Q is the initial state and F ⊆ Q is the set of final states. The transition function δ is in the usual way extended as a function Q×Σ * → 2 Q , and the language recognized by A is L(A) = {w ∈ Σ * | δ(q 0 , w) ∩ F = ∅}. If |δ(q, b)| ≤ 1 for all q ∈ Q and b ∈ Σ, the automaton A is a deterministic finite automaton (DFA). Note that we allow NFAs and DFAs to have undefined transitions. Our definition does not allow multiple start states or ε−transitions. Unless otherwise mentioned, we always assume that an NFA does not have any unreachable states.
A (state) path of the NFA A with underlying string w = b 1 b 2 · · · b k , b i ∈ Σ, i = 1, . . . , k, k ≥ 0, is a sequence of states (p 0 , p 1 , . . . , p ), where p j ∈ δ(p j−1 , b j ), j = 1, . . . , and either = k, or, < k and δ(p , b +1 ) = ∅. That is, the path must read the entire underlying string unless it encounters an undefined transition. Two paths are equal if and only if they have the same sequence of states and underlying string.
A path beginning in the start state q 0 , is a computation of A on the underlying string w. A computation (q 0 , p 1 , . . . , p ) is a complete computation on a string b 1 b 2 · · · b k if = k. An accepting computation is a complete computation that ends in an accepting state of F . The set of all (not necessarily complete) computations of A on the string w is denoted comp A (w).
Intuitively, a computation of A on a string w is a sequence of states that A reaches when started with the initial state and the symbols of w are read one by one. A complete computation ends with a state reached after consuming all symbols of w. An incomplete computation ends with a state where the transition on the next symbol of w is undefined.
A path (p 0 , p 1 , . . . , p k ), k ≥ 1, with underlying string b 1 b 2 · · · b k is a cycle if p 0 = p k . A cycle with one transition from a state to itself is called a self-loop. (A path of length zero with no transitions is not a cycle.) An NFA with no cycles is called an acyclic NFA (aNFA).
Cycles that are obtained from each other by a cyclical shift are said to be equivalent: For 0 < i < k, the above cycle (with p 0 = p k ) is equivalent to the cycle (p i , . . . , p k , p 1 , . . .
We define path trees that represent all computations of an NFA on all strings of a given length. Note that this is different than the notion of computation trees [10,17], which represent all computations of an NFA on a given string w.
For ∈ N, the path tree of an NFA A = (Q, Σ, δ, q 0 , F ) of depth , T A, , is a finite tree where the nodes are labelled by elements of Q and the edges are labelled by elements of Σ, defined inductively as follows: -T A,0 consists of a single node labelled by q 0 .
-Consider ≥ 1 and let leaf( − 1) be the set of leaf nodes of T A, −1 having distance − 1 from the root. If an x ∈ leaf( − 1) is labelled by q ∈ Q, then for each c ∈ Σ and q ∈ δ(q, c), in the tree T A, we add to node x a child y labelled by q , and the edge between x and y is labelled with c.
The pruned path tree of depth , T p A, , is obtained from T A, by recursively removing all leaf nodes which have distance smaller than from the root node.
The degree of ambiguity of an NFA A on a string w, da(A, w) [8,19], is the number of accepting computations of A on w, and the tree width of A on w, tw(A, w) [10,17], is the number of (not necessarily complete) computations of A on w. Note that Hromkovič et al. [10] call this "leaf size". Tree width is usually defined as the number of leaves of the computation tree of A on w. This quantity is identical to the cardinality of the set comp A (w).
For ≥ 0, the degree of ambiguity (respectively, tree width) of A on strings of length is defined as da(A, ) = max{da(A, w) | w ∈ Σ } (respectively, tw(A, ) = max{tw(A, w) | w ∈ Σ }). Strictly speaking, using common practice, we use da(A, ·) (and tw(A, ·)) to denote two different functions where one takes a string and the other an integer as argument.
The ambiguity (respectively, the tree width) of the NFA A is said to be finite if the above values are bounded for all ∈ N, and in this case, the degree of ambiguity (respectively, the tree width) of A is denoted da sup (A) (respectively, tw sup (A)).

String Path Width and Depth Path Width
We consider measures that count the number of complete computations on a given string and on all strings of given length, respectively. In the following, A = (Q, Σ, δ, q 0 , F ) is always an NFA. The string path width of A on a string w ∈ Σ * , SPW(A, w), is defined as the number of complete computations of A on w. For ∈ N, the string path width of A on strings of length is SPW(A, ) = max{SPW(A, w) | w ∈ Σ }, and when this value is bounded, the string path width of A is denoted SPW sup (A).  In fact, Goldstine et al. [6] have defined 'ambiguity' as the number of complete computations, which coincides with our notion of string path width. The string path width can be viewed as a blend between ambiguity and tree width in the sense of the following lemma. Since string path width counts only complete computations while tree width counts all computations, the string path width of an NFA A on a string w will always be at most the tree width of A on w.
Since string path width is, in the sense of Lemma 1 (iii), a special case of degree of ambiguity, from algorithms and bounds for ambiguity we get corresponding results for string path width. This is established using the transformation of the following lemma. In general, the transformed automaton is not equivalent to the original. Note that Lemma 1 (ii) gives a correspondence between string path width and tree width, but this cannot be used in a similar way because the corresponding transformation changes the string path width of the NFA. Also, it is known that for a fixed k and a given NFA A it can be decided in polynomial time whether da sup (A) (and consequently whether SPW sup (A)) is at least k, but the question for degree of ambiguity becomes PSPACE-complete if k is part of the input [3].
Next we introduce the depth path width of an NFA as the number of all complete computations of a given length. This metric can be viewed as a broader version of the string path width; while the string path width counts the number of computations on a specific string, the depth path width considers all strings of the same length.
Directly from the definition it follows that for NFAs over a unary alphabet, the notion of depth path width coincides with string path width.
We give the necessary and sufficient conditions for an NFA to have unbounded depth path width. For this we use the correspondence between depth path width and the number of leaves in path trees (defined in section 2).

Lemma 3.
Consider an NFA A and ∈ N. The value DPW(A, ) is equal to the number of leaves of the pruned path tree T p A, . Intuitively, the conditions of Theorem 1 mean that q 1 and q 2 belong to a cycle and the state q 1 has another transition to a state q 3 such that the computations originating from q 3 are defined on infinitely many strings. Here q 3 may or may not belong to the same cycle as q 1 and q 2 . If q 2 = q 3 , then the alphabet symbols a and b must be distinct. There exist q 1 , q 2 , q 3 ∈ Q and a, b ∈ Σ, where q 2 = q 3 or a = b, such that (i) q 2 ∈ δ(q 1 , a) and state q 1 is reachable from q 2 , and, (ii) q 3 ∈ δ(q 1 , b) and the language of the NFA A = (Q, Σ, δ, q 3 , Q) is infinite.
Proof. First assume that conditions (i) and (ii) hold. Let C 1 be a computation from q 0 to q 1 (recall that we assume that NFAs have no unreachable states). Let C 2 be a cycle from q 1 back to q 1 that begins with the transition on a to q 2 .
To show that DPW sup (A) is infinite, it is sufficient to show that for all M ∈ N there exists such that DPW(A, ) ≥ M . By condition (ii) there exists a path C M having length M · |C 2 | that begins in q 1 with the transition on b to q 3 . Now A has M different computations of length |C 1 | + M · |C 2 |: where D i is an initial part of the path C M having length (M − i) · |C 2 |. Note that the above are all distinct computations because the transitions from q 1 to q 2 on a and from q 1 to q 3 on b are distinct.
We sketch the proof in the "only if" direction: If DPW sup (A) is infinite, using Lemma 3 we see that the number of leaves of the pruned path tree T p A, can be chosen arbitrarily large for sufficiently large . When some state of A repeats on a path from the root to a leaf, we get a cycle and states satisfying conditions (i) and (ii).

Depth Path Width of Nearly Acyclic NFAs
We want to derive an upper bound for the finite depth path width of an m-state NFA. First we develop bounds for the depth path width measure of acyclic NFAs where the depth path width is naturally guaranteed to be finite. .
Note that the result of Proposition 1 indicates that the largest possible depth path width of an m-state aNFA is obtained by strings of length, roughly, m divided by two.
We now extend the result for arbitrary alphabet sizes.
Theorem 3. Let A be an m-state aNFA. Then The upper bound can be improved for acyclic DFAs (aDFA).

Corollary 2.
For an aDFA D with m states and k alphabet characters, the depth path width of D is at most k m−1 .
It is easy to verify that an NFA A does not satisfy the conditions of Theorem 1 if and only if A does not have two non-equivalent cycles where one is reachable from the other. (Two cycles are equivalent if they are obtained from each other by a cyclical shift, see section 2.) This condition forms the basis for the following definition.
Definition 1. An NFA A is nearly acyclic (naNFA) if it does not have two non-equivalent cycles, C 1 and C 2 , such that a state of C 2 is reachable from a state of C 1 . An naNFA with a deterministic transition function is called a nearly acyclic DFA (naDFA).
By Theorem 1, Definition 1 gives the most general class of NFAs that have finite depth path width. The influence of cycles that are reachable from one another is considered in a more general way by Msiska and van Zijl [16].
The limitation on the reachability between cycles implies a limitation on the number of (non-equivalent) cycles in a nearly acyclic NFA. The naNFAs with a maximal number of acyclic transitions and one self-loop on the initial state turn out to be useful for obtaining bounds for depth path width.
Definition 2. An m-state initial self-loop maximal nearly acyclic NFA, an imax-naNFA, over an alphabet Σ has the set of states {0, 1, . . . , m − 1} where 0 is the start state, there exists a transition on each alphabet symbol from i to j for all 0 ≤ i < j ≤ m − 1, and 0 has a self-loop.
The transitions of an imax-naNFA are uniquely determined, except for the self-loop on the initial state, which can be on an arbitrary element of Σ. (If needed we could specify the symbol labelling the self-loop.) Also, for purposes of depth path width, the set of final states can be arbitrary. In Figure 3 illustrating an m-state imax-naNFA, we use m − 1 as the only final state.
We calculate the depth path width of imax-naNFAs as a function of the number of states and alphabet size.  Since acyclic DFAs are a special case of nearly acyclic DFAs, we can use the value acquired in Corollary 2 as a lower limit on the upper bound for the depth path width of an naDFA. Lemma 5 gives the depth path width of imax-naNFAs. From Lemma 4 we recall that an naNFA can have multiple cycles, however, it seems plausible that an m-state imax-naNFA could have maximal depth path width among all mstate naNFAs. This is established in the following lemmas. Consider an m-state naNFA B where all cycles are self-loops. We can define an injective mapping from the set of computations of B having length to the length computations of an m-state imax-naNFA A. This then implies that the depth path width of B is at most that of A, and the observation is the basis for the following lemma. Proof. By Lemma 6, A can be converted to an m-state naNFA A over the same alphabet without decreasing the depth path width where all cycles in A are self-loops. Let B imax be an m-state imax-naNFA over the same alphabet. Now where the second inequality follows from Lemma 7 and the equality from Lemma 5. The equality also establishes the second claim of the theorem.

Languages recognized by naNFAs
Acyclic NFAs recognize the family of finite languages and, similarly, the nearly acyclic NFAs recognize a proper subfamily of the regular languages. The density of a language L ⊆ Σ * is defined as the function d L ( ) = |L ∩ Σ |, ∈ N. Proposition 2. (Shallit [21]) The density of a regular language L over Σ is bounded, that is d L ( ) ∈ O(1), if and only if L can be represented as a finite union of regular expressions xy * z, where x, y, z ∈ Σ * .
The nearly acyclic NFAs recognize exactly the constant density languages. Theorem 6. A regular language L has constant density if and only if L is recognized by a nearly acyclic NFA.
Proof. Suppose that L ⊆ Σ * is recognized by an m-state naNFA A. We show that d L ( ) ≤ m 3 · |Σ| m for all ∈ N. For ≤ m − 1 there is nothing to prove.
Consider then strings of length ≥ m. For each w ∈ Σ accepted by A, fix one accepting computation C w . Since A is nearly acyclic and ≥ m, the computation C w must pass through exactly one cycle. Thus, we can write w = w pref w cyc w suf where w cyc is the maximal substring of w that in the computation C w is "processed" by transitions of the cycle, and |w pref · w suf | ≤ m − 1. The number of strings of length at most m − 1 is upper bounded by |Σ| m . In a string of length at most m − 1 the cycle can occur in at most m locations and, according to Lemma 4, A has at most m cycles and, furthermore, each cycle (equivalence class) can be started in at most m positions. 3 Once a particular cycle and its position in the "acyclic part" of the computation (consuming the prefix w pref and suffix w suf ) are chosen, the length of the computation in the cycle is determined by the total length . Thus, the number of accepted strings of length is upper bounded by the constant m 3 · |Σ| m .
Conversely, if L has constant density then, by Proposition 2, L can be represented as a finite union of regular expressions of the form xy * z, x, y, z ∈ Σ * . An naNFA with one cycle recognizes xy * z, and the languages recognized by naNFAs are clearly closed under union.
By considering unary regular languages it is easy to see that a constant density language can be recognized by an NFA that is not nearly acyclic. However, for DFAs, we get the implication also in the converse direction.
Theorem 7. Any DFA recognizing a constant density language must be nearly acyclic.
As a corollary, we get that determinizing an naNFA must result in a nearly acyclic DFA. This could of course also be seen using a direct construction but it would require some effort.
Corollary 3. Let A be an naNFA and let D be the DFA obtained from A using the subset construction. Then D is nearly acyclic.

Conclusion
We have given an algorithm to decide whether the depth path width of an NFA is unbounded, and characterized automata with bounded depth path width as the class of nearly acyclic NFAs. We have given an upper bound for the finite depth path width of an m-state NFA over an alphabet of size k and shown that this bound is tight.
Nearly acyclic NFAs extend the class of acyclic NFAs that characterize the class of finite languages. A tight state complexity bound for determinizing acyclic NFAs is known [20]. From Corollary 3 we know that determinizing a nearly acyclic NFA always results in a nearly acyclic DFA. Establishing the worstcase size blow-up of determinizing a nearly acyclic NFA is a topic for future research. The size blow-up is at least as great as the exponential lower bound for determinizing unary (nearly acyclic) NFAs having cycles of different prime lengths [4].
Minimization of NFAs is PSPACE-complete [9] and remains NP-hard even for restricted subclasses of acyclic NFAs [1]. A linear time minimization algorithm for acyclic DFAs is given by Bubenzer [2] and incremental minimization techniques for acyclic NFAs have been considered e.g. by Lamperti et al. [14]. A topic for future work could be also to extend such methods for nearly acyclic NFAs.