The Quotient Operation on Input-Driven Pushdown Automata

. The quotient of a formal language K by another language L is the set of all strings obtained by taking a string from K that ends with a suﬃx from L , and removing that suﬃx. The quotient of a regular language by any language is always regular, whereas the context-free languages and many of their subfamilies, such as the linear and the deterministic languages, are not closed under the quotient operation. This paper establishes the closure of the family of input-driven pushdown automata (IDPDA), also known as visibly pushdown automata, under the quotient operation. A construction of automata representing the result of the operation is given, and its state complexity with respect to non-deterministic IDPDA is shown to be m 2 n + O ( m ), where m and n is the number of states in the automata recognizing K and L , respectively.


Introduction
Let K and L be formal languages over some alphabet Σ. Then, the right-quotient of K by L is the following formal language, denoted by K · L −1 .
The left-quotient operation is defined symmetrically.
The family of regular languages is closed under quotient with any language: as shown by Ginsburg and Spanier [6], if K is a regular language, then the languages K · L −1 and L −1 · K are both regular, regardless of L. For formal grammars, Ginsburg and Spanier [6] showed that for every context-free language K and a regular language L, their quotients are again context-free. On the other hand, if both arguments can be any context-free languages, then their quotient need not be context-free: indeed, for K = a{ b a | 1 } * and L = { a m b 2m | m 1 } * , their quotient satisfies K −1 L ∩ b * = { b 2 n | n 1 }. Besides just the non-closure, it is known that every recursively enumerable set is representable as a quotient of two context-free languages [8].
For an important subfamily of grammars, the LR(k) grammars, which are equivalently defined by deterministic pushdown automata (DPDA)-it is known that they are closed under right-quotient with regular languages, but not closed under left-quotient with finite languages [5]. Another classical subfamily of LL(k) grammars is not closed under both right-and left-quotient with regular languages [18]. On the other hand, the family of languages recognized by pushdown automata with one stack symbol (the one-counter languages) is surprisingly closed under quotient [9]. This paper investigates the quotient operation for one of the most important subclasses of pushdown automata: the input-driven pushdown automata (ID-PDA). These automata were introduced in the work of Mehlhorn [10] and of von Braunmühl and Verbeek [4], and are characterized by the following restriction: their input alphabet is split into three disjoint classes of symbols, on which the automaton must push one symbol onto the stack (left brackets), or must pop one symbol off the stack (right brackets) or may not touch the stack (neutral symbols). The model defined by Mehlhorn [10] was deterministic (DIDPDA); von Braunmühl and Verbeek [4] introduced its nondeterministic variant (NIDPDA) and presented a novel determinization construction. Furthermore, Mehlhorn [10] and von Braunmühl and Verbeek [4] presented efficient algorithms for simulating these automata.
Later, Alur and Madhusudan [1] reintroduced IDPDA under the name of visibly pushdown automata and pointed out their applications to verification; their work revived the interest in the model. One of the theoretical contributions of Alur and Madhusudan [1] is the study of the succinctness of description by input-driven automata. In particular, they proved that determinizing an n-state NIDPDA requires 2 Θ(n 2 ) states in the worst case, and initiated a systematic study of their closure properties.
In the follow-up work, the state complexity of the main language-theoretic operations on IDPDA was determined. The precise number of states necessary to represent concatenation, Kleene star and reversal by deterministic IDPDA (DIDPDA) was later determined by the authors [14]. For Boolean operations, the state complexity results were obtained by Han and Salomaa [7] and by Piao and Salomaa [16]. Recently, the authors [15] established the closure of IDPDA under the edit distance operation. For more details on the descriptional complexity of input-driven automata, an interested reader is directed to a fairly recent survey paper [12]. This paper investigates the quotient operation on IDPDA. The main result is that the family of languages recognized by IDPDA is closed under the quotient. If both argument languages consist only of well-nested strings, then so does their quotient, and the construction of an IDPDA for that quotient is straightforward. In the general case, without the well-nestendness condition, the closure is established by a more involved construction: given a pair of NIDPDA with m and n states, a construction of a (3m + m 2 n)-state NIDPDA recognizing their quotient is described in Section 3. The rest of the paper establishes a close lower bound to this construction. The general plan of the lower bound argument, explained in Section 4, is to construct witness languages of a special form, so that the task of constructing them is basically a problem of finding witness NFA (nondeterministic finite automata) for the state complexity of a certain unconventional operation on languages. This operation has been named palindromic quotient, and the NFA state complexity problem for it is solved in Section 5. The results are adapted to NIDPDA in the final Section 6 2 Input-driven automata

s'
The input alphabet of an input-driven pushdown automaton (IDPDA) [1,2,10] is split into three disjoint sets of left brackets Σ +1 , right brackets Σ −1 and neutral symbols Σ 0 . If the input symbol is a left bracket from Σ +1 , then the automaton always pushes one symbol onto the stack. For a right bracket from Σ −1 , the automaton must pop one symbol. Finally, for a neutral symbol in Σ 0 , the automaton may not use the stack. In this paper, symbols from Σ +1 and Σ −1 shall be denoted by left and right angle brackets, respectively (<, >), whereas lower-case Latin letters from the beginning of the alphabet (a, b, c, . . .) shall be used for symbols from Σ 0 . Input-driven automata may be deterministic (DIDPDA) and nondeterministic (NIDPDA).
Under the original definition used by Mehlhorn [10] and by von Braunmühl and Verbeek [4], input-driven automata operate on input strings, in which the brackets are well-nested. When an input-driven automaton reads a left bracket (< ∈ Σ +1 ), it pushes a symbol onto the stack. This symbol is popped at the exact moment when the automaton encounters the matching right bracket (> ∈ Σ −1 ). Thus, a computation of an input-driven automaton on any well-nested substring leaves the stack contents untouched.
For instance, in Figure 1, the fragment of the computation beginning in the state q 4 and ending in the state q 12 processes a well-nested substring b<<cd>e>, and therefore ends with the same stack contents as in which it began (in this case, the empty stack).
The more general definition of input-driven automata proposed by Alur and Madhusudan [1] also allows ill-nested input strings, such as the whole string <a>>b<<cd>e><f in Figure 1. For every unmatched left bracket, the symbol pushed to the stack when reading this bracket is never popped, and remains in the stack to the end of the computation; in the figure, this is the case with the symbol s pushed in the state q 12 . An unmatched right bracket is read with an empty stack: instead of popping a stack symbol, the automaton merely detects that the stack is empty and makes a special transition, which leaves the stack empty. The latter happens in the state q 3 in the figure, where the special transition upon an unmatched right bracket leads the automaton to the state q 4 . [4]; Alur and Madhusudan [1]). A nondeterministic input-driven pushdown automaton (NIDPDA) over an alphabet Σ = (Σ +1 , Σ −1 , Σ 0 ) consists of a finite set Q of states, with set of initial states Q 0 ⊆ Q and accepting states F ⊆ Q; a finite stack alphabet Γ , and a special symbol ⊥ / ∈ Γ for the empty stack; for a neutral symbol c ∈ Σ 0 , a transition function δ c : Q → 2 Q gives the set of possible next states; for each left bracket symbol < ∈ Σ +1 , the behaviour of the automaton is described by a function δ < : Q → 2 Q×Γ , which, for a given current state, provides a set of pairs (q, s), with q ∈ Q and s ∈ Γ , where each pair means that the automaton enters the state q and pushes s onto the stack; for every right bracket symbol > ∈ Σ −1 , there is a function δ > : Q × (Γ ∪ {⊥}) → 2 Q specifying possible next states, assuming that the given stack symbol is popped from the stack (or that the stack is empty).

Definition 1 (von Braunmühl and Verbeek
A configuration is a triple (q, w, x), with the current state q ∈ Q, remaining input w ∈ Σ * and stack contents x ∈ Γ * . Possible next configurations are defined as follows.
The language recognized by A is the set of all strings w ∈ Σ * , on which the automaton, having begun its computation in the configuration (q 0 , w, ε), eventually reaches a configuration of the form (q, ε, x), with q ∈ F and with any stack contents x ∈ Γ * .
An NIDPDA is deterministic (DIDPDA), if there is a unique initial state and every transition provides exactly one action.

As shown by von Braunmühl and
Verbeek [4], every n-state NIDPDA operating on well-nested strings can be transformed to a 2 n 2 -state DIDPDA. Alur and Madhusudan [1] proved that 2 Ω(n 2 ) states are necessary in the worst case, and also extended the transformation to handle ill-nested inputs, with the resulting DIDPDA using 2 2n 2 states.
For more details on input-driven automata and their complexity, the readers are directed to a recent survey [12].

Closure under the quotient
In this section, it is proved that the language family defined by input-driven automata is closed under the quotient operation.
For the class of regular languages, it is well-known that they are closed under quotient with any language. Indeed, if K is recognized by a deterministic finite automaton (DFA), then, from each state q of this DFA, it is the case or not the case that the DFA accepts some string from L beginning from q. Depending on this, q is relabelled as accepting or rejecting, and the resulting DFA recognizes exactly the quotient K · L −1 .
Turning to input-driven automata, as long as all strings in L are well-nested, the same property still holds. That is, an n-state DIDPDA recognizing K can be transformed to an n-state DIDPDA recognizing the quotient K · L −1 , simply by relabelling its states.
Given an arbitrary pair of NIDPDA, A and B, the goal is to construct a new NIDPDA C that recognizes their quotient, L(A) · L(B) −1 . Whenever the automaton A accepts a string uv, and the other automaton B accepts the string v, the simulating automaton should therefore accept u. If none of the brackets in the u-part of uv match any brackets in the v-part, then the simulation proceeds like in the case of finite automata, without using any extra states. In the general case, the string u may have unmatched left brackets, v may have unmatched right brackets, and these brackets match each other in uv; thus, the computation of A may rely on the data transferred from u to v in the stack symbols. The simulating automaton C is given only u, with its unmatched left brackets, and while doing so, it has to guess the string v and imagine the computations of both A and B on this guessed v.
In the computation of C on u, these imaginary computations on v are traced backwards, so that whenever a left bracket (<) in u matches a right bracket (>) in v, the simulating automaton C, upon reading u up to that left bracket, tracks the imaginary computations of A and B that begin from the matching right bracket (>) in v and accept in the end of v. As C finishes reading the string u, its imaginary computations on v are backtracked to their beginning at the boundary between u and v. Then, at this point C ensures that B's computation is in its initial configuration, whereas the actual simulated computation of A on u smoothly continues into the imaginary computation of A on v. Thus, C finally verifies that a string v and a computation on it that it has been guessing actually do exist; and accordingly C accepts u.
This idea is implemented in the following construction.
Lemma 1. Let K be a language recognized by an NIDPDA A with the set of states P and with the pushdown alphabet Γ , and let L be another language recognized by an NIDPDA B with the set of states Q and with the pushdown alphabet Ω. Then, the quotient K · L −1 is recognized by an NIDPDA C with the set of states (P × {0, 1}) ∪ P ∪ (P × P × Q) and with the pushdown alphabet Proof (a sketch). At the first phase of the computation of C on an input string u, the simulation of the computations of A and B on its imaginary continuation v has not yet been started. This means that C assumes that all left brackets read so far are either going to have a matching right bracket in u, or are unmatched both in u and in v. Thus, at the first phase, C simply simulates the operation of A on a prefix of u, while maintaining a single extra bit of data: whether the stack is empty or not. This is represented in states of the form (p, d), where p ∈ P is the state of A, and d ∈ {0, 1}, with d = 0 representing stack emptiness. While in these states, C uses stack symbols of the form (s, d), with s ∈ Γ and d ∈ {0, 1}, which also carry the information on whether this stack symbol is at the bottom of the stack (d = 0). This allows the simulating automaton to enter a state of the form (p, 0) upon popping the last symbol from the stack, and thus always be aware of its stack's emptiness.
Every time C reads a left bracket (<), it nondeterministically guesses whether this bracket has a matching right bracket (>) in v. If C guesses that this is not the case, it pushes the same stack symbol as A would push (that is, C pushes (s, 0) or (s, 1), if A would push s), and continues its computation in a state of the form (p, 0) or (p, 1). If later, while still at the first phase, C encounters a matching right bracket and pops that symbol, it again behaves as A would do, remaining in a state from P × {0, 1}.
At some point, C may read a left bracket (<) and decide that it has a matching right bracket in v, so that A operating on uv would transfer some stack symbol s from the left bracket (<) to the right bracket (>). If this guess is correct, then this left bracket is unmatched in u, and thus C will never have a chance to pop the stack symbol it pushes at this moment; for that reason, it pushes a special stack symbol (#) that will cause immediate rejection if it is ever popped. At the same time, C guesses the computations of A and B on a suffix of v containing the matching right bracket (>) and the neighbouring well-nested substrings, and enters the second phase of the simulation in a state from P × P × Q.
In the second phase, C uses triples of the form (p, p, q) as states, and, while reading the input string u from left to right, it also guesses an imaginary string v from right to left, along with the computations of A and of B on that imaginary string. According to this plan, the first component of each triple, p ∈ P , is the state of the ongoing simulation of A on the prefix of u read so far. The other two components are the states of A and B processing v. To be precise, the second component, p ∈ P , is a state, beginning from which A accepts a suffix of v guessed in the course of this simulation, whereas q ∈ Q is a state of B, beginning from which it accepts the same guessed suffix of v.
When C nondeterministically decides to move to the second phase along with reading a left bracket (<), it guesses A's and B's computations on the last suffix of the imaginary second part of the string. If C's stack is empty-that is, if C is in a state (p, 0)-then the last suffix of v is of the form x>y, where x is a well-nested string, the right bracket (>) following x is the one that matches the current left bracket (<) in u, and y is a concatenation of a descending string and an ascending string (that is, a concatenation of well-nested strings and right brackets, followed by well-nested strings and left brackets). All right brackets in y are then unmatched both in u and in the earlier part of v, and accordingly, C may enter any state (p , p, q) satisfying the following conditions: 1. upon reading this left bracket (<) in the state p, A pushes some stack symbol s ∈ Γ and enters the state p ; 2. the automaton A, having begun its computation on x>y in the state p and with s on the stack, accepts; 3. the other automaton B, having begun its computation on x>y in the state q and with the empty stack, accepts as well.
In the other case, if C's stack is not empty, and it is therefore in a state (p, 1), the suffix of v is of the form x>y, where both x and y are well-nested, and the above three conditions remain the same.
Transitions of C in a state (p, p, q) are defined as follows. A right bracket (>) cannot be read in this state, and if it is encountered, C rejects.
Upon reading a neutral symbol c ∈ Σ 0 , the simulation of A in the first component continues, while the last two components stay unchanged.
When reading a left bracket (<), the automaton C again has to guess whether this bracket has a matching right bracket (>) in v. In case it does, C pushes the stack symbol (#) that will cause rejection if popped, and advances the simulation in all three components of the state in the same way as it did when entering the second phase. On the other hand, if C nondeterministically guesses that this left bracket (<) has a matching bracket in u, it suspends the simulation of A and B on the imaginary suffix v, pushing a triple (s, p, q) onto the stack, where s is the stack symbol in the ongoing simulation of A on u. Then, C enters a state p ∈ P and begins processing the current well-nested substring of u in the state from P , simulating only A.
When this well-nested substring ends, C reads the matching right bracket (>) in u and pops the triple (s, p, q) from the stack. Then, it resumes the second phase of the simulation in the state (p , p, q), where p is the next state in the ongoing simulation of A on u.
The precise correctness statement of the construction takes the following form. When the simulating NIDPDA, after having read a string t< 1 u 1 . . . < h u h ∈ Σ * , where t is any string, u 1 , . . . , u h are well-nested strings and < 1 , . . . , < h are unmatched left brackets in this string, is in a state (p, p, q) and has stack contents (s h , p h , q h ) . . . (s 1 , p 1 , q 1 ), this means that, first, there exists a computation of A on the string t< 1 u 1 . . . < h u h that pushes each symbol s i on the corresponding left bracket < i , and reaches the state p after reading t< 1 u 1 . . . < h u h , and second, there exists a string of the form v = v h > h . . . v 1 > 1 w, where v 1 , . . . , v h are wellnested strings, > 1 , . . . , > h are right brackets and w ∈ Σ * is any string that has no matching right brackets (>) to any left brackets (<) in t, so that A, having begun its computation on v in the state p, with the stack contents s h . . . s 1 , after popping each right bracket > i will be in the corresponding state p i , and will accept in the end, whereas B, having begun its computation on the same string v in the state q and with the empty stack, will be in the state q i after each right bracket > i , and will accept the string as well.
The correctness statement could be proved by induction on the length of the computation.
Finally, accepting states are of the form (p, p, q 0 ), that is, A finishes reading u in the state p, and A accepts v beginning in the state p, and also B accepts v beginning in the state q 0 . Then, C recognizes exactly the desired quotient.
This proves the closure under right-quotient. Since the family of languages recognized by input-driven automata is closed under reversal (where, in the reversed string, left brackets become right brackets and vice versa [2]) the closure result also extends to the left-quotient operation.

Plan for a lower bound argument
The construction given in the previous section uses 3m+m 2 n states to represent the quotient, and it turns out that it cannot be much improved upon. A lower bound on the state complexity of the quotient of NIDPDA shall be proved using witness languages of the following general form.
Fix an alphabet of labels, Γ . The first language contains nested sequences of brackets with the matching brackets having identical labels; it is a subset of the following base language.
K 0 = { < a1 . . . < am > am . . . > a1 | m 0, a 1 , . . . , a m ∈ Γ } All strings in the second language consist of right brackets (>), which are to be erased by the quotient operation. Thus, the second language is a subset of the following language.
L 0 = { > am . . . > a1 | m 0, a 1 , . . . , a m ∈ Γ } An automaton A recognizing a subset K ⊆ K 0 performs two tasks. First, upon reading each bracket < a , it pushes the symbol a to stack, and upon reading a bracket > a it ensures that the symbol being popped is a; doing this task does not require any states. Second, it operates on the string as a DFA, ensuring that it belongs to a certain regular language.
The second automaton B recognizes a subset L ⊆ L 0 essentially as a DFA. Then, the quotient K · L −1 contains a string of the form < a1 . . . < am if the whole string < a1 . . . < am > am . . . > a1 is in K, whereas its second half > am . . . > a1 belongs to L.
In order to construct efficient witness languages of this form, it is convenient to reformulate them in terms of finite automata, and to consider a related state complexity problem for finite automata. Let every left bracket (< a ) labelled with a symbol a ∈ Γ be regarded as a symbol a, and let every right bracket (> a ) be regarded as a, from a marked copy of the alphabet Γ = { a | a ∈ Γ }. Then the associated state complexity problem for finite automata over Γ ∪ Γ ∪ {#} is concerned with the complexity of the following palindromic quotient operation on languages with respect to NFAs. . . a m | a 1 . . . a m # a m . . . a 1 ∈ K, a m . . . a 1 ∈ L } Lemma 2. Let K ⊆ Γ * # Γ * and L ⊆ Γ * be any languages, and define the corresponding languages over the alphabet of brackets as follows. Then: 1. if K is recognized by an m-state NFA, then K is recognized by an m-state NIDPDA; 2. if L is recognized by an n-state NFA, then L is recognized by an n-state NIDPDA; 3. if K · (L ) −1 is recognized by an N -state NIDPDA, then PQ(K, L) is recognized by an N -state NFA.
In particular, to prove the third part, one can directly transform an IDPDA recognizing the quotient K · (L ) −1 to an NFA recogizing the palindromic quotient P Q(K, L) by eliminating all transitions by right brackets and by ignoring all symbols pushed to the stack upon reading left brackets.

The lower bound for NFA
In order to apply Lemma 2, the task is now to determine the state complexity of the palindromic quotient operation with respect to NFAs. The tools for doing this are well-known.
Lemma 3 (Fooling set lemma [3]). If a regular language L has a fooling set of cardinality k, then nsc(L) k.
Consider an alphabet The lower bound for the state complexity of the operation PQ(·, ·) will be used for obtaining a lower bound for the state complexity of quotient of input driven languages. For this reason the alphabet is partitioned into sets Σ, Σ and {#} which play the roles of left brackets, right brackets and neutral symbols, respectively.

Lemma 4. If
A is an NFA with n states and B an NFA with m states, the language PQ(L(A), L(B)) has an NFA with n 2 · m states.
Proof. The language PQ(L(A), L(B)) can be recognized by an NFA C operating as follows. On input w ∈ Σ * , C simulates in parallel (i) a computation of A from a start state to a state q 1 , (ii) a computation of A in reverse starting from a final state on the string w, ending in a state q 2 , and (iii) a computation of B from a final state in reverse on the string w ending in a state p. Thus, the states of C are triples (q 1 , q 2 , p) where q 1 , q 2 are states of A and p is a state of B. A state (q 1 , q 2 , p) is accepting if A has a transition on # from q 1 to q 2 and p is a start state of B. Now C has n 2 · m states and, by the choice of the final states, it is clear that L(C) = PQ(L(A), L(B)). Proof. Define Note that the definition allows some of the substrings u i to be empty which means that the strings of K may begin or end with # and have consecutive occurrences of #. The language K is recognized by an NFA A = (Ω, Q, 0, 0, δ) where Q = {0, 1, . . . , n − 1} and the transitions of δ are defined by setting The automaton A is, in fact, an incomplete DFA having a cycle of length n where the cycle counts the sum of the numbers of symbols a and b modulo n. Transitions on # are defined only when the current sum has a value divisible by n. This means that A checks that in the substring between two occurrences of # the sum of the numbers of occurrences of a and b must be divisible by n and A recognizes exactly the language K.
It is clear that L has an NFA with a cycle of length m that simply verifies that the input is in { a, b, c} * and counts the number of occurrences of symbols c modulo m.
For establishing the lower bound for the nondeterministic state complexity of PQ(K, L) we define S = {(a i b j c k , a n−i b n−j c m−k ) | 0 i, j n − 1, 0 k m − 1}.
The set S has cardinality n 2 · m and to prove the claim, by Lemma 3, it is sufficient to verify that S is a fooling set for PQ(K, L).
For any pair (a i b j c k , a n−i b n−j c m−k ) of S we have a i b j c k · a n−i b n−j c m−k ∈ PQ(K, L) because with w = a i b j c k a n−i b n−j c m−k we have w# w R ∈ K and w R ∈ L due to the observations that |w| a + |w| b = n, | w| a + | w| b = n and | w| c = m.
Next consider two distinct elements of S, (a i b j c k , a n−i b n−j c m−k ) and (a r b s c t , a n−r b n−s c m−t ), where (i, j, k) = (r, s, t). Denote w = a i b j c k · a n−r b n−s c m−t . If k = t, w ∈ PQ(K, L) because | w| c ≡ 0 mod m. If i = r then |w| a + |w| b = i + n − r ≡ 0 mod n and, consequently, w# w R ∈ K and w ∈ PQ(K, L). Similarly, if j = s then | w| a + | w| b = j + n − s ≡ 0 mod n and again w# w R ∈ K.

The state complexity of the quotient
The results on the number of states in NIDPDA needed to represent the quotient are put together in the following theorem.
Theorem 2. In order to represent the quotient of an m-state NIDPDA by an n-state NIDPDA, it is sufficient to use an NIDPDA with 3m + m 2 n states. In the worst case, it is necessary to use at least m 2 n states.
This gives the state complexity of m 2 n + O(m). If the goal is to construct a deterministic automaton, one possible solution is to determinize the constructed NIDPDA. However, that would produce as many as 2 Θ(m 4 n 2 ) states. Previously, for some operations, such as the concatenation, a much more succinct direct construction of a DIDPDA was defined [14] using the idea of computing behaviour functions of the given DIDPDA [11]. Investigating whether there is a significantly better construction of a DIDPDA for a quotient of two DIDPDAs is left as an open problem. A possible starting point is the DFA state complexity of the palindromic quotient operation defined in this paper.
Another open problem concerns the state complexity of the quotient for the intermediate unambiguous IDPDA model [13].