On the Degree of Nondeterminism of Tree Adjoining Languages and Head Grammar Languages

. The degree of nondeterminism is a measure of syntactic complexity which was investigated for parallel and sequential rewriting systems. In this paper, we consider the degree of nondeterminsm for tree adjoining grammars and their languages and head grammars and their languages. We show that a degree of nondeterminism of 2 suﬃces for both formalisms in order to generate all languages in their respective language families. Furthermore, we show that deterministic tree adjoining grammars (those with degree of nondeterminism equal to 1), can generate non-context-free languages, in contrast to deterministic head grammars which can only generate languages containing a single word.


Introduction
The degree of nondeterminism for tabled Lindenmayer systems and languages has been studied in [9] and [8] as a measure of syntactic complexity. The degree of nondeterminism has also been considered for sequential rewriting systems in [2] and [3]. The degree of nondeterminism is usually defined as the maximal number of production rules with the same left-hand side which provides a measure of the amount of choice available during derivations using the grammar. In this paper we consider the degree of nondeterminism for tree adjoining grammars and head grammars. Tree adjoining grammars were first introduced in [5] and their formal properties and linguistic relevance have been considered in [4] and [10]. TAGs are tree-generating grammars which use an adjoining operation that generates new trees by joining and attaching two different trees at a particular node. Head Grammars were first introduced in [7]. The principle feature which distinguishes a Head Grammar (HG) from a context-free grammar is that the head grammar includes a wrapping operation which allows one string to be inserted into another string at a specific point (the head). It is known that for both tree adjoining grammars and head grammars, the class of string languages generated by the grammars is larger than the class of context-free languages (e.g. they are able to define the language {a n b n c n d n |n ≥ 0} [10]). In [10] it is shown that the two formalisms generate exactly the same class of string languages, and that these languages are mildly context-sensitive.
The notion of mild context-sensitivity tries to capture mathematical and computational properties that formal models for the description and analysis of natural language should possess. The notion of mild context-sensitivity was first mentioned in [4] and sparked active research yielding to many different approaches and definitions thereof (see, for example, [6]). There has been much discussion about the linguistic differences between mildly context-sensitive grammar formalisms, and in general, investigations mainly focus on polynomial parsing algorithms. Formal properties of mild context-sensitive grammar formalisms have not been as extensively considered. The examination of degree of nondeterminism for TAGs and MHGs is a step in that direction. It would be interesting to consider whether there are any linguistic implications for the degree of nondeterminism -for example, are there aspects of natural language modelling which are best done with a grammar having a higher (or lower) degree of nondeterminism than others?

Notational Conventions
The reader is assumed to be familiar with the basic notions in formal language theory. We use the following notational conventions and definitions in this paper. |S| denotes the cardinality of the set S, ∅ denotes the empty set, ∪ denotes set union, and \ denotes set difference. S is called an alphabet if it is a finite nonempty set of symbols. N denotes the set of natural numbers {1, 2, 3, . . .}. For any set X, a word over X is a finite sequence of symbols from X. λ will be used to denote the empty word. The concatenation of two words x and y is denoted by xy and represents the word formed by the juxtaposition of x and y. The concatenation of a word x and a set S is xS = {xy | y ∈ S} (Sx is similarly defined). X * is the free monoid generated by X with concatenation as binary operation and λ as identity element. X + = X * \ λ.

Tree Adjoining Grammars (TAGs)
Tree Adjoining Grammars (TAGs) are linguistically motivated tree-generating grammars which were originally introduced in [5]. For linguistic applications, TAGs have an advantage over string generating grammars such as context-free grammars because the elementary objects and all the objects generated are trees, which represent syntactic structure explicitly, as opposed to strings, which do not. In what follows, we give an informal description of TAGs first and their formal definition later.
The components of a TAG are a set of initial trees and a set of auxiliary trees. Each node in an initial tree or an auxiliary tree is labelled by a terminal symbol, by λ, or by a nonterminal symbol and constraints (which serve to restrict adjunction at that node, as will be explained later). The initial trees are the axioms used in the generation of new trees. The only means by which new trees are generated is the adjunction operation, which allows an auxiliary tree to be inserted into an initial tree or a derived (i.e. previously generated) tree. TAGs which include a second operation, substitution, are not discussed here as they are equivalent in generative power to TAGs which use only adjoining.
Tree adjunction is illustrated in Figure 1 (adapted from [10]). The tree shown on the left, γ, is an initial tree or a derived tree. The root node of γ is labelled by the nonterminal A, and γ contains an interior node n which is labelled by the nonterminal B. The tree in the centre, β, is an auxiliary tree in which both the root node and the foot node, a special node on the frontier of the tree, are labelled by B. As the nonterminal labelling n in γ and the nonterminal labelling the root node of β are the same, it is possible to adjoin β at n. Adjunction results in the new tree γ which is constructed by removing the subtree rooted at n from γ, inserting β into γ at the point where n was removed, and then replacing the foot node in β by the subtree originally rooted at n. The language defined by a TAG G is the set of all words which are produced as the yield of some tree generated through zero or more adjunction operations in G. The yield of a tree is the word obtained by concatenating the terminal symbols on the leaf nodes of the tree, read from left to right. In a TAG, initial trees must have only terminal symbols on their leaf nodes, and auxiliary trees have terminal symbols on all leaf nodes except for the foot node. Thus, every tree generated through adjunction operations in G has a terminal word as its yield, and that word is an element of L(G). All trees discussed in this paper are finite.
In a TAG, a node n in a tree γ is labelled either by a terminal symbol, by λ, or by a triple of the form A, sa (γ,n) , oa (γ,n) where A is a nonterminal symbol and sa (γ,n) and oa (γ,n) are called the adjunction constraints and have the following interpretations 3 : sa (γ,n) is a set of trees. For a given node, sa (γ,n) (selective adjunction) is the set of trees from the grammar which are allowed to be adjoined at that node. We assume that for all β in sa (γ,n) , the root node of β is labelled by the same nonterminal as n. If sa (γ,n) = ∅, then adjunction is not permitted at that node, and we can write NA to indicate a null adjunction constraint. -oa (γ,n) ∈ {true, false}. If oa (γ,n) has the value true then we speak about an OA (obligatory adjunction) constraint and if oa (γ,n) has the value false then this indicates that adjunction is optional.
where, N is the alphabet of nonterminal symbols, T the alphabet of terminal symbols, N ∩ T = ∅, I and A are given as follows -I is a finite set of initial trees where each α ∈ I satisfies: • All interior nodes of α are labelled by B, sa (α,n) , oa (α,n) , B ∈ N , sa (α,n) ⊆ A and oa (α,n) ∈ {true, false}. • All leaf nodes of α are labelled by some u ∈ {T ∪ {λ}}.
-A is a finite set of auxiliary trees where each β ∈ A satisfies: • All interior nodes of β are labelled by B, sa (β,n) , oa (β,n) , B ∈ N , sa (β,n) ⊆ A and oa (β,n) ∈ {true, false}. • All leaf nodes of β are labelled by some u ∈ {T ∪ {λ}}, except the foot node, denoted by f t(β), which carries the same category (but not necessarily the same adjunction constraints) as the root node.
Tree adjunction is a partial ternary operation ∇(γ, β, n) which produces a new tree, γ , which is a copy of γ with the auxiliary tree β inserted at the node with address n. We define a derived tree to be an initial tree, an auxiliary tree or a tree produced by an application of ∇. We say adjunction is permitted when the following conditions hold for the arguments of ∇: γ is a derived tree, β is an auxiliary tree, and n is the address of an interior node in γ with label γ(n) = B, sa (γ,n) , oa (γ,n) . The root node of β must be labelled by the same nonterminal as n, that is, B, sa (β,λ) , oa (β,λ) , and β must be an element of sa (γ,n) .
After adjunction the labels of the nodes are unchanged from their original labels in γ and β, except for the nodes affected by adjunction: the node at address n which now carries the label from the root node of β, and the foot node of β with the change that the oa constraint on the node is set to false.
For a TAG G = (N, T, I, A), a derivation in G will be denoted by =⇒ G . The tree γ can be derived from γ if and only if there exist β ∈ A and n in gamma such that adjunction of β in γ at n is permitted and ∇(γ, β, n) = γ . Then we The tree language generated by G is the set of all trees which can be generated in zero or more derivation steps from the initial trees of G, and in which no nodes remain which are labelled by an OA (obligatory adjunction) constraint.
γ for some α ∈ I and γ has no OA nodes } The yield of a tree is the string one obtains by concatenating the labels on the leaf nodes from left to right.
For a TAG G = (N, T, I, A) with tree language T (G), the tree adjoining language generated by G is Let L TAG represent the family of tree adjoining languages.

Degree of Nondeterminism for TAGs
The degree of nondeterminism for tree adjoining grammars will measure the amount of choice between auxiliary trees which can be adjoined within a given TAG. When defining the degree of nondeterminism for TAGs an essential ambiguity in the interpretation has to be taken into account. On the one hand, when defining the degree of nondeterminism for a given node n in a tree γ, one could consider only the auxiliary trees in the set sa (γ,n) , which can be adjoined at that node. On the other hand, one could consider all auxiliary trees in the set A for the given tree adjoining grammar (even if they are not in the set sa (γ,n) ). We will call these views weak degree of nondeterminism and strong degree of nondeterminism, respectively. In this section we will define strong and weak degree of nondeterminism, and then show that the two measures are equivalent. For the following definitions, consider a TAG G = (N, T, I, A). Let γ represent an arbitrary tree in I ∪ A and β ∈ A represent an arbitrary auxiliary tree.

Definition 2. Weak degree of nondeterminism
-For a node in γ at address n labelled by B, sa (γ,n) , oa (γ,n) , the degree of the node is denoted by Deg G (γ, n), and is defined as the number of trees in the selective adjunction set for the node. That is, Deg G (γ, n) = |sa (γ,n) |. -The weak degree of nondeterminism of a tree adjoining grammar G is denoted by Det w (G), and is defined as the maximal degree of any node in a tree in G: The weak degree of nondeterminism of a tree adjoining language L, Det w (L), is defined as the minimal weak degree of nondeterminism of any TAG capable of generating L: -The strong degree of nondeterminism for a tree adjoining grammar G, denoted by Det s (G), is defined as the maximal degree of a nonterminal in N : The strong degree of nondeterminism of a tree adjoining language L, Det s (L), is defined as the minimal strong degree of nondeterminism of any TAG capable of generating L: Det s (L) = min{Det s (G) | G is a TAG with L(G) = L}.
We will now show that strong and weak degree of nondeterminism are equivalent measures for TAGs. Proof. By definition, for every node n in a tree γ ∈ I ∪ A labelled by B, sa (γ,n) , oa (γ,n) , sa (γ,n) ⊆ A. Therefore, Deg G (γ, n) ≤ Deg G (B) for any given node labelled by B in a tree γ at n, and thus Det w (G) ≤ Det s (G). Proof. The intuitive idea behind the proof is that the set of auxiliary trees which can be adjoined at any given node is determined by two conditions: (i) the nonterminal symbol labelling the node, and (ii) the sa constraint which restricts the subset of auxiliary trees which are actually permitted to be adjoined at that node. The Algorithm 1 below works by making copies of the auxiliary trees such that for an sa set containing {β 1 , . . . , β k }, k new auxiliary trees are introduced, whose root nodes are labelled by a common nonterminal which is used only for the auxiliary trees in that sa set. The result of copying and relabelling is that the strong degree of nondeterminism for the grammar is reduced to the weak degree because the number of auxiliary trees labelled by any given nonterminal is equal to the size of the sa set in which the auxiliary trees bearing that nonterminal appear. The algorithm recursively relabels all new auxiliary trees which are created.
An example of the effect of Algorithm 1 for one node is shown in Figure 2. The trees at the top, α 1 , β 1 and β 2 , are the trees from the TAG before relabelling takes place. In the new initial tree, α 1 , the relabelled node can be seen. The new auxiliary trees, δ 1 and δ 2 , are copies of β 1 and β 2 respectively for which new root and foot nodes have been added, and relabelling has been recursively applied to produce β 1 and β 2 . considered, A is the set of new auxiliary trees constructed so far Postconditions: N has been updated to include any new nonterminals, t is unchanged, A has been updated to include any new trees resulting from the relabelling of t Returns a new tree t which is the relabelled t Let t be a copy of t For each node n of t labelled by A, sa (t,n) , oa (t,n) , with sa (t,n) = {β1, . . . , β k } Let Aβ1 · · · β k be a nonterminal symbol Let δi be a new tree name, δi / ∈ A Let δi be an auxiliary tree constructed as follows: • label the root node of δi by Aβ1 · · · β k , ∅, false • connect the root node of δi to the root node of a copy of βi • connect the foot node of the copy of βi to the foot node of δi • label the foot node of δi by Aβ1 · · · β k , ∅, false Let sa (t,n) = {τ | τ ∈ A , τ (λ) = Aβ1 · · · β k , sa (τ,λ) , oa (τ,λ) } (sa (t,n) is the set of all trees in A whose root nodes are labelled by Aβ1 · · · β k ) Let the node corresponding to n in t be labelled by Aβ1 · · · β k , sa (t,n) , oa (t,n) End Function Thus, as the strong degree of nondeterminism can be reduced to the weak degree for any given TAG, one measure of degree of nondeterminism is sufficient. We can omit reference to strong or weak in our notation, and therefore, Det(G) will be used to denote the (weak) degree of nondeterminism for TAGs, and Det TAG (L) will denote the degree of nondeterminism for tree adjoining languages. Finally, we will show that for a TAG G with degree of nondeterminism greater than 2, we can create an equivalent TAG G with degree of nondeterminism equal to 2. Thus, the degree of nondeterminism for any tree adjoining language is at most 2. Proof. The following Algorithm 2 examines all the nodes in the initial trees and auxiliary trees of G. When a node n in a tree γ is found with Deg(γ, n) > 2, this indicates that there is a choice between more than two auxiliary trees for adjunction at that node. Suppose the selective adjunction set for the node is sa (γ,n) = {β 1 , . . . , β n } with n > 2. The algorithm works by introducing new auxiliary trees δ 1 , . . . , δ n−2 each consisting of only a root node and foot node. The purpose of the δ i trees is to reduce the choice between auxiliary trees to 2 at any given node. Node n is relabelled such that only β 1 or δ 1 can be adjoined, that is, sa (γ,n) = {β 1 , δ 1 }. At the root node of δ 1 , β 2 or δ 2 can be adjoined, that is, sa (δ1,λ) = {β 2 , δ 2 }. Generally, for δ i with 1 ≤ i < n−2, sa (δi,λ) = {δ i+1 , β i+1 } For δ n−2 , the root node is labelled by the sa set sa δn−2,λ = {β n−1 , β n }. Introduction of new auxiliary trees and relabelling is done for all nodes in G with degree greater than 2. The resulting TAG G generates the same language as G, but contains no node with more than 2 trees in its sa set, and therefore Det(G ) = 2.

Modified Head Grammars (MHGs)
We will consider Modified Head Grammars (MHGs) which were proposed in [11], and differ only slightly from the definition in [7]. The strings used in modified head grammars are called headed strings. In a headed string, a special position between two symbols, marked by ↑, is designated as the head of the string. MHGs use a wrapping operation to insert one string into another, and the purpose of the head is to designate the insertion point during this operation. For an alphabet X, let H X be the set of headed strings over X. H X is defined as: For example, for the alphabet X = {a, b, c}, abc↑cbacba, λ↑aaa and λ↑λ are three of the elements of H X .
The production rules of an MHG are defined in terms of two types of operations, wrapping and concatenation, which are performed on headed strings.
The wrapping operation, W : H 2 X → H X is a binary operation which has the effect of inserting one string into another at the head. Given headed strings v 1 ↑w 1 and v 2 ↑w 2 , the result of applying W is a new headed string comprised of v 2 ↑w 2 inserted into v 1 ↑w 1 at its head: The concatenation of headed strings is an n-ary operation denoted by C m,n : H n X → H X , where n is the number of headed strings to be concatenated and m is the index of the string whose head becomes the head for the resulting string. The indices must satisfy n ≥ 1 and 1 ≤ m ≤ n. The interpretation of C m,n is as follows: C m,n (v 1 ↑w 1 , v 2 ↑w 2 , . . . , v m ↑w m , . . . , v n ↑w n ) = v 1 w 1 v 2 w 2 · · ·v m ↑w m · · ·v n w n Given a nonterminal alphabet N and a terminal alphabet T , a headed string expression over N and T is recursively defined as follows: -Every headed string σ ∈ H T is a headed string expression.
-For all A ∈ N , A is a headed string expression.
Let E N,T represent the set of headed string expressions over N and T . By convention, we will use σ to represent a headed string expression. If a headed string expression contains no nonterminals, we call it closed. where N is a finite set of nonterminal symbols, T is a finite set of terminal symbols, S ∈ N is the start symbol, P is a set of production rules {p 1 , . . . , p k }, Consider an MHG, G = (N, T, P, S), with p i = A → σ i ∈ P . Given a headed string expression σ ∈ E N,T containing a nonterminal A, we may apply the rule p i to replace one instance of A in σ by the right hand side of p i , σ i . Let σ denote the resulting string. Then we write σ =⇒ G σ to indicate that σ can be derived from σ using a production rule in G. If the grammar in use is clear from the context, we write =⇒ rather than =⇒ For an MHG G = (N, T, P, S), the expression language generated by G, E(G), is the set of all closed headed string expressions which can be derived from S using the rules of G. Formally, The head language generated by G, H(G), is the set of headed strings which result from the evaluation according to the definitions of W and C m,n of the closed headed string expressions in E(G): The language generated by G, L(G), is the set of strings one obtains by removing the heads from the strings in H(G): Let L MHG denote the family of languages which can be defined by MHGs.

Degree of Nondeterminism for MHGs
Let G = (N, T, P, S) be an MHG. For a nonterminal A, let P A be the set of production rules with A on the left-hand side. That is, It will now be shown that the degree of nondeterminism for any MHG language is at most 2. The proof of Theorem 4 contains an algorithm which generates an MHG with degree of nondeterminism equal to 2 from any MHG with degree of nondeterminism greater than 2. Proof. An MHG G with Det(G) > 2 contains nonterminals A ∈ N which appear on the left hand side of more than 2 production rules. The Algorithm 3 presented below introduces new nonterminal symbols and production rules so that the choice between production rules at any point in a derivation is always binary. To understand how it works, suppose there is a nonterminal A ∈ N which appears on the left hand side of 3 rules, p 1 : A → σ 1 , p 2 : A → σ 2 and p 3 : A → σ 3 . After execution of the algorithm, the nonterminal A would be replaced by three nonterminal, A 1 ,A 2 and A 3 , and the rules p 1 ,p 2 and p 3 would be replaced by six production rules: The rules are "chained" such that the same strings can be derived but the choice of production rules at any given time is reduced to 2.
where σ i is the headed string expression which results by replacing all B ∈ N appearing in σi by B1 A deterministic MHG is an MHG G = (N, T, P, S) for which Det(G) = 1. In other words, no nonterminal A ∈ N appears on the left-hand side of more than one production rule p i ∈ P .
Proof. We can observe the following requirements for the production rules of a deterministic MHG with a nonempty language: (i) At least one production rule must have only terminal symbols or λ on the right-hand side. (ii) The same nonterminal symbol may not appear on the left and right-hand side of a given production rule. (iii) There can be no set of production rules P cycle ⊆ P = p 1 , . . . , p k which have the following form: p 1 : A 1 → σ 1 where σ 1 contains A 1 . . . p i : A i → σ i , where σ i contains A i+1 . . . p k : A k → σ k , where σ k contains A 1 .
(i) is necessary so that it is possible to derive a headed string expression which is closed. (ii) and (iii) are necessary so that the sequence of derivations does not contain a loop. Such a loop would prevent the sequence of derivations from ending since each nonterminal appears on the left hand side of only one rule, and therefore the derivation leading to the loop would be chosen every time. Thus, since the sequence of derivations does not contain a loop and must start from S, if L(G) is nonempty then |L(G)| = 1.

Conclusions
The relationship between TAGs and MHGs was explored in several papers [10] and [11]. In this paper, we have shown that for both TAGs and MHGs, the de-gree of nondeterminism 2 suffices to generate all languages in their respective language families. Reducing the degree of nondeterminism with our algorithms can increase the number of elementary trees in a TAG or the number of production in a MHG considerably. We note that there is a significant difference between deterministic MHGs and deterministic TAGs. In [10], an example of a TAG appears which has only one auxiliary tree (and is therefore deterministic by our definition), and yet it generates the language {a n b n c n d n | n ≥ 0} which is noncontext-free. By contrast, deterministic MHGs are only capable of generating languages for which |L| ≤ 1. Finally, there is a small question which arose concerning TAGs during our work. Although we know that deterministic TAGs are capable of generating noncontext-free languages, we did not identify the class of languages which can be generated by deterministic TAGs.