Streaming Property Testing of Visibly Pushdown Languages *

In the context of formal language recognition, we demonstrate the superiority of streaming property testers against streaming algorithms and property testers, when they are not combined. Initiated by Feigenbaum et al. , a streaming property tester is a streaming algorithm recognizing a language under the property testing approximation: it must distinguish inputs of the language from those that are ε -far from it, while using the smallest possible memory (rather than limiting its number of input queries). Our main result is a streaming ε -property tester for visibly pushdown languages ( Vpl ) with memory space poly((log n ) /ε ). Our construction is done in three steps. First, we simulate a visibly pushdown automaton in one pass using a stack of small height but whose items can be of linear size. In a second step, those items are replaced by small sketches. Those sketches rely on a notion of suﬃx-sampling we introduce. This sampling is the key idea for taking beneﬁt of both streaming algorithms and property testers in the third step. Indeed, the last step relies on a (non-streaming) property tester for weighted regular languages based on a previous tester by Alon et al . This tester can directly be used for streaming testing special cases of instances of Vpl that are already hard for both streaming algorithms and property testers. We then use it to decide the correctness of completed items, given their sketches, before removing them from the stack.


Introduction
We focus on streams representing data with both a linear ordering and a hierarchically nested matching of items.Data with such dual linear-hierarchical structure arise in various context, e.g. in semi-structured data management when handling HTML/XML documents or in program analysis when considering executions of recursive programs.Regular languages, as recognized by finite state automata, revealed a natural and successful tool to express properties of streams but lack the ability to handle the hierarchical structure.Contextfree languages easily capture the latter but turn out to be too expressive hence, quickly lead to intractable complexity.In contrast, visibly pushdown languages (Vpl) [6] while encompassing regular languages, enjoy most of its good properties and permit to handle data with both a linear and a hierarchical structure.In the context of semi-structured documents, they are closely related with regular languages of unranked trees as captured by hedge automata: indeed, a well-known result [3] states that, when the tree is given by its depth-first traversal, such automata correspond to visibly pushdown automata (Vpa) (see e.g.[18] for an overview on automata and logic for unranked trees).In databases, this word encoding of XML document is known as SAX representation: the document is a linear sequence of text characters, along with a hierarchically nested matching of open-tags with closing tags.Numerous popular subclasses of XML documents (e.g.those satisfying a given DTD specifications) are subclasses of Vpl.In program analysis, Vpa permit to capture natural properties of execution traces of recursive finite-state programs.For such programs, desirable specifications are expressed on the call-stack (e.g."a module A should be invoked only if the module B belongs to the call-stack"): such properties can be expressed in the temporal logic of calls and returns (CaRet) [5,4] that itself is captured by Vpa.Hence, the analysis of execution traces boils down to check membership in a Vpl.
Therefore, the study of Vpl is central to understand how massive semi-structured data (e.g.large semi-structured documents or execution traces) can be analyzed by sublinear algorithms, such as streaming algorithms and property testers.
Historically, Vpl got several names such as input-driven languages or, more recently, languages of nested words.Intuitively, a Vpa is a pushdown automaton whose actions on stack (push, pop or nothing) are solely decided by the currently read symbol.As a consequence, symbols can be partitioned into three groups: push, pop and neutral symbols.The complexity of Vpl recognition has been addressed in various computational models.The first results go back to the design of logarithmic space algorithms [11] as well as NC 1 -circuits [13].Later on, other models motivated by the context of massive data were considered, such as streaming algorithms and property testers (described below).
Streaming algorithms (see e.g.[22]) have only a sequential access to their input, on which they can perform a single pass, or sometimes a small number of additional passes.The size of their internal (random access) memory is the crucial complexity parameter, which should be sublinear in the input size, and even polylogarithmic if possible.The area of streaming algorithms has experienced tremendous growth in many applications since the late 1990s.The analysis of Internet traffic [2], in which traffic logs are queried, was one of their first applications.Nowadays, they have found applications with big data, notably to test graphs properties, and more recently in language recognition on very large inputs.The streaming complexity of language recognition has been firstly considered for languages that arise in the context of memory checking [8,12], of databases [28,27], and later on for formal languages [20,7].However, even for simple Vpl, any randomized streaming algorithm with p passes requires memory Ω(n/p), where n is the input size [17].
As opposed to streaming algorithms, (standard) property testers [9,10,16] have random access to their input but in the query model.They must query each piece of the input they need to access.They should sample only a sublinear fraction of their input, and ideally make a constant number of queries.In order to make the task of verification possible, decision problems need to be approximated as follows.Given a distance on words, an ε-tester for a language L distinguishes with high probability the words in L from those ε-far from L, using as few queries as possible.Property testing of regular languages was first considered for the Hamming distance [1].When the distance allows sufficient modifications of the input, such as moves of arbitrarily large factors, it has been shown that any context-free language becomes testable with a constant number of queries [19,15].However, for more realistic distances, property testers for simple languages require a large number of queries, especially if they have one-sided error only.For example the complexity of an ε-tester for well-parenthesized expressions with two types of parentheses is between Ω(n 1/11 ) and O(n 2/3 ) [25], and it becomes linear, even for one type of parentheses, if we require one-sided error [1].The difficulty of testing regular tree languages was also addressed when the tester can directly query the tree structure [23,24].
Faced by the intrinsic hardness of Vpl in both streaming and property testing, we study the complexity of streaming property testers of formal languages, a model of algorithms combining both approaches.Such testers were historically introduced for testing specific problems (groupedness) [14] relevant for network data.They were later studied in the context of testing the insert/extract-sequence of a priority-queue structure [12].We extend these studies to classes of problems.A streaming property tester is a streaming algorithm recognizing a language under the property testing approximation: it must distinguish inputs of the language from those that are ε-far from it, while using the smallest possible memory (rather than limiting its number of input queries).Such an algorithm can simulate any standard non-adaptive property tester.Moreover, we will see that, using its full scan of the input, it can construct better sketches than in the query model.
In this paper, we consider a natural notion of distance for Vpl, the balanced-edit distance, which refines the edit distance on balanced words (where for each push symbol there is a matching pop symbol at the same height of the stack, and conversely).It can be interpreted as the edit distance on trees when trees are encoded as balanced words.Neutral symbols can be deleted/inserted, but any push symbol can only be deleted/inserted together with its matching pop symbol.Since our distance is larger than the standard edit distance, our testers are also valid for the edit distance.
In Section 3, we first design an exact algorithm that maintains a small stack but whose items can be of linear size as opposed to the standard simulation of a pushdown automaton which usually has a stack of possible linear size but with constant size items.In our algorithm, stack items are prefixes of some peaks (which we call unfinished peaks), where a peak is a balanced factor whose push symbols appear all before the first pop symbol.Our algorithm compresses an unfinished peak u = u + v − when it is followed by a long enough sequence.More precisely, the compression applies to the peak v + v − obtained by disregarding part of the prefix of push sequence u + .Those peaks are then inductively replaced, and therefore compressed, by the state-transition relation they define on the given automaton.The relation is then considered as a single symbol whose weight is the size of the peak it represents.In addition, to maintain a stack of logarithmic depth, one of the crucial properties of our algorithm (Proposition 6) is rewriting the input word as a peak formed by potentially a linear number of intermediate peaks, but with only a logarithmic number of nested peaks.
In Section 4, for the case of a single peak, we show how to sketch the current unfinished peak of our algorithm.The simplicity of those instances will let us highlight our first idea.Moreover, they are already expressive enough in order to demonstrate the superiority of streaming testers against streaming algorithms and property testers, when they are not combined.We first reduce the problem of streaming testing such instances to the problem of testing regular languages in the standard model of property testing (Theorem 16).Since our reduction induces weights on the letters of the new input word, we need a tester for weighted regular languages.Such a property tester has previously been devised in [24] extending constructions for unweighted regular languages [1,23].However, we consider a slightly simpler construction that could be of independent interest.As a consequence we get a streaming property tester with polylogarithmic memory for recognizing peak instances of any given Vpl (Theorem 17), a task already hard for streaming algorithms and property testers (Fact 8).E S A 2 0 1 6 43:4

Streaming Property Testing of Visibly Pushdown Languages
In Section 5, we construct our main tester for a Vpl L given by some Vpa.For this we introduce a more involved notion of sketches made of a polylogarithmic number of samples.They are based on a new notion of suffix sampling (Definition 18).This sampling consists in a decomposition of the string into an increasing sequence of suffixes, whose weights increase geometrically.Such a decomposition can be computed online on a data stream, and one can maintain samples in each suffix of the decomposition using a standard reservoir sampling.This suffix decomposition will allow us to simulate an appropriate sampling on the peaks we compress, even if we do not yet know where they start.Our sampling can be used to perform an approximate computation of the compressed relation by our new property tester of weighted regular languages which we also used for single peaks.We first establish a result of stability which basically states that we can assume that our algorithm knows in advance where the peak it will compress starts (Lemma 22).Then we prove the robustness of our algorithm: words that are ε-far from L are rejected with high probability (Lemma 23).As a consequence, we get a one-pass streaming ε-tester for L with one-sided error η and memory space O(m 5 2 3m 2 (log n) 6 (log 1/η)/ε 4 ), where m is the number of states of a Vpa recognizing L (Theorem 20).

Definitions and Preliminaries
Let N * be the set of positive integers, and for any n ∈ N * , let [n] = {1, 2, . . ., n}.A t-subset of a set S is any subset of S of size t.For a finite alphabet Σ we denote the set of finite words over Σ by Σ * .We denote by u • v (or simply uv) the word obtained by concatenating u and v.For a word When we mention letters and factors of u we implicitly also mention their positions in u.We say that v is a If v is a sub-factor of v then the overlap of v and v is v.Given two multisets of factors S and S , we say that S ≤ S if there is an injection f : S → S such that for each factor v ∈ S, v ≤ f (v).

Weighted Words and Sampling
A weight function on a word u with n letters is a function λ : [n] → N * on the letters of u, whose value λ(i) is called the weight of u(i).A weighted word over Σ is a pair (u, λ) where u ∈ Σ * and λ is a weight function on u.We define The length of (u, λ) is the length of u.For simplicity, we will denote by u the weighted word (u, λ).Weighted letters will be used to substitute factors of same weights.Our algorithms will be based on sampling of small factors according to their weights.We introduce a very specific notion adapted to our setting.For a weighted word u, we denote by k-factor sampling on u the sampling over factors exists, otherwise l is such that i + l is the last letter of u.More generally, we call k-factor such a factor.For the special case of k = 1, we call this sampling a letter sampling on u.In fact the general case k > 1 simply reduces to k = 1.Indeed, simply observe that k-factor sampling can be obtained from letter sampling by sampling on the first letters of the factors and online completing any sampled letter to produce its associated k-factor.Therefore, from now on, we only focus on how to perform letter samplings, that we implicitly extend to samplings on k-factors when required.In particular, without further constraints, letter sampling can be implemented using a standard reservoir sampling (see Algorithm 1).
Even if our algorithm will require several samples from a k-factor sampling, we will often only be able to simulate this sampling by sampling either larger factors, more factors, or both.We introduce the notion of over-sampling to formalize this: Definition 1.Let W 1 be a sampler producing a random multiset S 1 of factors of some given weighted word u.Then W 2 over-samples W 1 if it produces a random multiset S 2 of factors of u such that for each factor v of u, we have Pr(∃v

Finite State Automata and Visibly Pushdown Automata
A finite state automaton is a tuple of the form We write p u −→q, to mean that there is a sequence of transitions in A from p to q while processing u, and we call (p, q) a u-transition.A word u is accepted if q in u −→q f for some q in ∈ Q in and q f ∈ Q f .The language L(A) of A is the set of words accepted by A, and we refer to such a language as a regular language.For Σ ⊆ Σ, the Σ -diameter (or simply diameter when Σ = Σ) of A is the maximum over all possible pairs (p, q) ∈ Q 2 of min{|u| : p u −→q and u ∈ Σ * }, whenever this minimum is not over an empty set.We say that A is Σ -closed, when p u −→q for some u ∈ Σ * if and only if p u −→q for some u ∈ Σ * .A pushdown alphabet is a triple Σ + , Σ -, Σ = that comprises three disjoint finite alphabets: Σ + is a finite set of push symbols, Σ -is a finite set of pop symbols, and Σ = is a finite set of neutral symbols.For any such triple, let Σ = Σ + ∪ Σ -∪ Σ = .Intuitively, a visibly pushdown automaton [26] over Σ + , Σ -, Σ = is a pushdown automaton restricted so that it pushes onto the stack only on reading a push, it pops the stack only on reading a pop, and it does not modify the stack on reading a neutral symbol.Up to coding, this notion is similar to the one of input driven pushdown automata [21] and of nested word automata [6].

Streaming Property Testing of Visibly Pushdown Languages
To represent stacks we use a special bottom-of-stack symbol ⊥ that is not in Γ.A configuration of a Vpa A is a pair (σ, q), where q ∈ Q and σ ∈ ⊥ • Γ * .For a ∈ Σ, there is an a-transition from a configuration (σ, q) to (σ , q ), denoted (σ, q) a −→(σ , q ), in the following cases: If a is a push symbol, then σ = σγ for some (q, a, q , γ) ∈ ∆, and we write q a −→(q , push(γ)).If a is a pop symbol, then σ = σ γ for some (q, a, γ, q ) ∈ ∆, and we write (q, pop(γ)) a −→q .If a is a neutral symbol, then σ = σ and (q, a, q ) ∈ ∆, and we write q , q).The language L(A) of A is the set of words accepted by A, and we refer to such a language as a visibly pushdown language (Vpl).
At each step, the height of the stack is pre-determined by the prefix of u read so far.The height height(u) of u ∈ Σ * is the difference between the number of its push symbols and of its pop symbols.A word u is balanced if height(u) = 0 and height(u[1, i]) ≥ 0 for all i.We also say that a push symbol u(i) matches a pop symbol u(j) if height(u[i, j]) = 0 and height(u[i, k]) > 0 for all i < k < j.By extension, the height of u(i) is height(u[1, i − 1]) when u(i) is a push symbol, and height(u[1, i]) otherwise.
For all balanced words u, the property (σ, p) u −→(σ, q) does not depend on σ, therefore we simply write p u −→q, and say that (p, q) is a u-transition.We also define similarly to the notions for finite automata above the Σ -diameter of A (or simply diameter) and the notion of A being Σ -closed.These definitions only consider balanced words.
Our model is inherently restricted to input words having no prefix of negative stack height, and we defined acceptance with an empty stack.This implies that only balanced words can be accepted.From now on, we assume that the input is balanced as verifying this in a streaming context is easy.

Streaming Property Testers
Assume we have, for any ε > 0, a criterion to declare that an input u is ε-far from a language L.An ε-tester for L accepts all inputs in L with probability 1 and rejects with high probability all inputs ε-far from L. Two-sided error testers have also been studied but in this paper we stay with the notion of one-sided testers, that we adapt in the context of streaming algorithm as in [14].Definition 3. Let ε > 0 and let L be a language.A streaming ε-tester for L with one-sided error η and memory s(n) is a randomized algorithm A such that, for any input u of length n given as a data stream: If u ∈ L, then A accepts with probability 1; If u is ε-far from L, then A rejects with probability at least 1 − η; A processes u within a single sequential pass while maintaining a memory Even if we only focus on the space complexity of streaming testers, all our streaming testers have polylogarithmic (in n/ε) time per processing letter.
For a distance d between words, we say that a word u is ε-far from a language L if d(u, v) > ε|u| for every v ∈ L, i.e. the ε-neighborhood of u does not intersect L. Hence, any distance on words leads to a notion of streaming property tester.Remark that any ε-tester for some distance d 1 turns out to be also a (cε)-tester for any other distance d 2 such that d 2 ≤ cd 1 , where c > 0 is some constant.

Balanced/Standard Edit Distance
The usual distance between words in property testing is the Hamming distance.In this work, we consider an easier distance to manipulate in property testing but still relevant for most applications, which is the edit distance, that we adapt to weighted words.
Given a word u, we define two possible edit operations: the deletion of a letter in position i with corresponding cost |u(i)|, and its converse operation, the insertion where we also select a weight for the new u(i).Note that, for simplicity, we drop the usual substitution operation, leading to a possible multiplicative factor of 2 in the resulting distance.This is not an issue when designing streaming property testers as observed above.The (standard) edit distance dist(u, v) between two weighted words u and v is defined as the minimum total cost of a sequence of edit operations changing u to v.All letters that have not been inserted nor deleted must keep the same weight.For a restricted set of letters Σ , define dist Σ (u, v) when insertions (but not deletions) are restricted to letters in Σ (this makes dist Σ not symmetric).
We will also consider a restricted version of this distance for balanced words, motivated by our study of Vpl.Similarly, balanced-edit operations can be deletions or insertions of letters, but each deletion of a push symbol (resp.pop symbol) requires the deletion of the matching pop symbol (resp.push symbol).Similarly for insertions: if a push (resp.pop) symbol is inserted, then a matching pop (resp.push) symbol must also be inserted simultaneously.The cost of these operations is the weight of the affected letters, as with the edit operations.We define the balanced-edit distance bdist(u, v) between two balanced words as the total cost of a sequence of balanced-edit operations changing u to v. Similarly to dist Σ (u, v) we define bdist Σ (u, v).We omit Σ when Σ = Σ.
When dealing with a visibly pushdown language, we will always use the balanced-edit distance, whereas we will use the standard-edit distance for regular languages.Note that since balanced-edit distance is larger than the standard edit distance, our testers will also be valid for that distance.

Exact Algorithm
Fix a Vpa A recognizing some Vpl L on Σ = Σ + ∪ Σ -∪ Σ = .In this section, we design an exact streaming algorithm that decides whether an input belongs to L. Algorithm 2 maintains a stack of small height but whose items can be of linear size.In Section 5, we replace stack items by appropriate small sketches.

Notations and Algorithm Description
Call a peak a sequence of push symbols followed by an equal number of pop symbols, with possibly intermediate neutral symbols, i.e. an element of the language Λ = j≥0 (( −→q} of the v-transitions, and consider R v as a new neutral symbol with weight |v|.In fact, for the purpose of the analysis of our algorithm, we augment neutral symbols by many more relations for which A remains Σ-closed.Indeed, we allow any relation R of any weight such that, when (p, q) ∈ R, there is a v ∈ Λ such that p v −→q, but that v could be different for every (p, q) ∈ R. For the rest of the paper, they will be the only symbols with weight potentially larger than 1.Definition 4. Let Σ Q be Σ = augmented by all letters 'R' encoding a relation R ⊆ Q × Q such that for every (p, q) ∈ R there is a balanced word u ∈ Σ * with p u −→q.In addition we allow any weight |R| ≥ 1 for those letters.Let Λ Q be Λ where Σ = is replaced by Σ Q .In order to bound the size of the stack, Algorithm 2 considers the maximal balanced suffix v 2 of the topmost element v 1 • v 2 of the stack and, whenever |u 0 | ≥ |v 2 |/2, it computes the relation R v2 and continues with a bigger current peak starting with v 1 (see lines 18 to 20 and Figure 1c).A consequence of this compression is that the elements in the stack have geometrically decreasing weight and therefore the height of the stack used by Algorithm 2 is logarithmic in the length of the input stream.This can be proved by a direct inspection of Algorithm 2. Proposition 6. Algorithm 2 accepts exactly when u ∈ L, while maintaining a stack of at most log |u| items.
We state that Algorithm 2, when processing an input u of length n, considers at most O(log n) nested peaks, that is Depth(v) = O(log n) for all factors constructed in Algorithm 2. The Special Case Of Peaks We now consider restricted instances consisting of a single peak.For these instances, Algorithm 2 never uses its stack but u 0 can be of linear size.We show how to replace u 0 by a small random sketch in order to get a streaming property tester using polylogarithmic memory.In Section 5, this notion of sketch will be later extended to obtain our final streaming property tester for general instances.

Hard Peak Instances
Peaks are already hard for both streaming algorithms and property testers.Indeed, consider the language Disj ⊆ Λ over alphabet Σ = {0, 1, 0, 1, a} and defined as the union of all languages • a * , where j ≥ 1, x, y ∈ {0, 1} j , and x(i)y(i) = 1 for all i.
Then Disj can be recognized by a Vpa with 3 states, Σ + = {0, 1}, Σ -= {0, 1} and Σ = = {a}.However, the following fact states its hardness for both models.The hardness for non-approximation streaming algorithms comes for a standard reduction to Set-Disjointness.The hardness for property testing algorithms is a corollary of a similar result due to [25] for parenthesis languages with two types of parentheses.
Fact 8. Any randomized p-pass streaming algorithm for Disj requires memory space Ω(n/p), where n is the input length.Moreover, any (non-streaming) (2 −6 )-tester for Disj requires to query Ω(n 1/11 / log n) letters of the input word.
Surprisingly, for every ε > 0, we will show that languages of the form L ∩ Λ, where L is a Vpl, become easy to ε-test by streaming algorithms.This is mainly because, given their full access to the input, streaming algorithms can perform an input sampling which makes the property testing task easy, using only a single pass and little memory.

Slicing Automaton
Observe that Algorithm 2 will never use the stack in the case of a single peak.After Algorithm 2 has processed the i-th letter of the data stream, u 0 contains u [1, i] where the eventual initial sequence of neutral symbols has been removed.We will show how to compute R u0 at line 14 using a standard finite state automaton without any stack.
Indeed, for every Vpl L, one can construct a regular language L such that testing whether u ∈ L ∩ Λ is equivalent to test whether some other word u belongs to L. For this, let I be a special symbol not in Σ = encoding the relation set {(p, p) : p ∈ Q}.For a word v ∈ Σ l = , write [v, I] for the word (v(1), I) Then the slicing of u (see Figure 2) is the word u over the alphabet Σ and which has weight The slicing of A is the finite automaton where the transitions ∆ are: Run in the slicing automaton A on u This construction will be later used in Section 5 for weighted languages.In that case, we define the weight of a letter in u by |(a, b)| = |a| + |b|, with the convention that |I| = 0.Moreover, we write Σ Q for the alphabet obtained similarly to Σ using Σ Q instead of Σ = .Note that the slicing automaton A defined on Σ Q is Σ-closed and has Σ-diameter at most 2m 2 where m = |Q|.Indeed, the slicing automaton has m 2 states and every letter in Σ has weight at most 2, hence the shortest path from two states (when exists) has weight at most 2m 2 .In particular, it directly implies the following.

Random Sketches
We are now ready to build a tester for L ∩ Λ.To test a word u we use a property tester for the regular language L. Regular languages are known to be ε-testable for the Hamming distance with O((log 1/ε)/ε) non-adaptive queries on the input word [1], that is queries that can all be made simultaneously.Those queries define a small random sketch of u that can be sent to the tester for approximating R u .Since the Hamming distance is larger than the edit distance, those testers are also valid for the latter distance.Observe also that, for The only remaining difficulty is to provide to the tester an appropriate sampling on u while processing u.
We will proceed similarly for the general case in Section 5, but then we will have to consider weighted words.Therefore we show how to sketch u in that general case already.Indeed, the tester of [1] was simplified for the edit distance in [23], and later on adapted for weighted words in [24].We consider here an alternative approach that we believe to be simpler, but slightly less efficient than the tester of [24].
Our tester for weighted regular languages is based on k-factor sampling on u that we will simulate by an over-sampling built from a letter sampling on u, that is according to the weights of the letters of u only.This new sampling can be easily performed given a stream of u using a standard reservoir sampling.
The sampling W k (u) from Definition 12: sample is in red.
Let u ∈ Λ and let u[i, i + k] be a factor that contains at least one push symbol.Call i 1 (resp.i 2 ) the smallest (resp.largest) integer such that i 1 ≥ i (resp.i 2 ≤ i + k) and u(i 1 ) (resp.u(i 2 )) is a push symbol.Then the matching pop sequence of u[i, i + k] is defined as u[j 1 , j 2 ] where u(j 1 ) (resp.u(j 2 )) is the matching pop symbol of u(i 1 ) (resp.u(i 2 )).
Definition 12.For a weighted word u ∈ Λ Q , denote by W k (u) the sampling over subwords of u constructed as follows (see Figure 3 Add u[max(j,, j ] to the sample (hence, some matching pops of u[i, i + k] may not belong to u[max(j, j − 2k), j ]).
Let us stress that in the above definition the weight of letters only matter in (1), and not in (2) which cares about matching push and pop symbols, which are of weight 1.One consequence is that one can design a randomized streaming algorithm performing this sampling.
Fact 13.There is a randomized streaming algorithm with memory O(k + log n) which, given k and u as input, samples W k (u).Lemma 14.Let u be a weighted word, and let k be such that 4k ≤ |u|.Then 4k independent copies of W k (u) over-sample the k-factor sampling on u.
We can now give an analogue of the property tester for weighted regular languages in L ∩ Λ Q .For that, we use the following notion of approximation.
Our tester is going to be robust enough in order to consider samples that do not exactly match the peaks we want to compress.

43:13
where a is a letter.
Let v be obtained from v by at most ε|v| balanced deletions.Then, the conclusion is still true if the algorithm is given an independent W k (v ) for each z i instead, except that R now provides a (3ε, Σ)-approximation.Last, each sampling can be replaced by an over-sampling.
As a consequence we get our first streaming tester for L ∩ Λ.
Theorem 17.Let A be a Vpa for L with m ≥ 2 states, and let ε, η > 0. Then there is a streaming ε-tester for L ∩ Λ with one-sided error η and memory space O((m 8 log(1/η)/ε 2 ) (m 3 /ε + log n)), where n is the input length.
Proof.We use Algorithm 2 where we replace the current factor u 0 by T = 4kt independent samplings W k (u 0 ).We know that such samplings can be computed using memory space O(k + log n) by Fact 13.By Proposition 10, the slicing automaton has Σ-diameter d at most 2m 2 .Therefore, from Theorem 16, taking t = 4 4dm 3 (log 1/η)/ε and k = 4dm/ε leads to the desired conclusion.

5
Algorithm With Sketching

Sketching Using Suffix Samplings
We now describe the sketches used by our main algorithm.They are based on the generalization of the random sketches described in Section 4.3.Moreover, they rely on a notion of suffix sampling, that ensures a good letter sampling on each suffix of a data stream.Recall (see Section 2.1) that a letter sampling on a weighted word u samples a random letter u(i) (with its position) with probability |u(i)|/|u|, and that a sampling on k-factors can be derived from a letter sampling.Therefore we will sample k-factors using an (α, t)-suffix sampling.
Definition 18.Let u be a weighted word and let α > 1.An α-suffix decomposition of u of size s (see Figure 4) is a sequence of suffixes (u l ) 1≤l≤s of u such that: u 1 = u, u s is the last letter of u, and for all l, u l+1 is a strict suffix of u l and if where a is a single letter.An (α, t)-suffix sampling on u of size s is an α-suffix decomposition of u of size s with t letter samplings on each suffix of the decomposition.
We observe that (α, t)-suffix samplings can be either concatenated or compressed as stated below.Proof.For Concatenate, it suffices to do the following.For each suffix u l of D u : (1) replace u l by u l • v; and (2) replace the i-th sampling of u l by the i-th sampling of v with probability |v|/(|u| + |v|), for i = 1, . . ., t.
For Simplify, do the following.For each suffix u l of D u , from l = s u (the smallest one) to l = 1 (the largest one): (1) replace all suffixes u l−1 , u l−2 , . . ., u m by the largest suffix u m such that |u m | ≤ α|u l |; and (2) suppress all samples from deleted suffixes.
Using this proposition, one can easily design a streaming algorithm constructing online a suffix decomposition of polylogarithmic size.Starting with an empty suffix-sampling S, simply concatenate S with the next processed letter a of the stream, and then simplify it.

Final Algorithm
Our final algorithm is a modification of Algorithm 2: in particular it approximates relations R v (in the spirit of Definition 15) by elements in Σ Q , instead of exactly computing them.Let us stress that even if some R v is approximated by an R that does not correspond to any R u , one has R ∈ Σ Q , which means that for any (p, q) ∈ R, there is a balanced word u ∈ Σ * depending on (p, q) with p u −→q.
To mimic Algorithm 2 we need to encode (compactly) each unfinished peak v of the stack and u 0 : for that we use the data structure described in Data Structure 3. Our final algorithm, Algorithm 4, is simply Algorithm 2 with this new data structure and corresponding adapted operations, where ε = ε/(6 log n), T = 4608m 4 2 2m 2 (log 2 n)(log 1/η)/ε 2 and k = 24m2 m 2 (log n)/ε.
The methods are described in Algorithm 4, where we implicitly assume that each letter processed by the algorithm comes with its respective height and (exact or approximate) weight.They use functions Concatenate and Simplify described in Proposition 19, while adapting them.
In the next section, we show that the samplings S v l are close enough to an (1 + ε )-suffix sampling on v l .This lets us build an over-sampling of an (1 + ε )-suffix sampling.We also show that it only requires a polylogarithmic number of samples.Then, we explain how to recursively apply the tester from Theorem 16 (with ε ) in order to obtain the compressions at line 14 and 20 while keeping a cumulative error below ε.We now state our main result whose proof relies on Lemmas 22 and 23.

Algorithm 1
Reservoir Sampling Input: Data stream u, Integer t > 1 standing for the number of samples Data structure: σ ← 0 // Current weight of the processed stream S ← empty multiset // Multiset of sampled letters Code: a ← Next(u), σ ← |a| S ← t copies of a While u not finished a ← Next(u), σ ← σ + |a| 10 For each b ∈ S 11 Replace b by a with probability |a|/σ 12 Output S

Lemma 7 .
Let v be the factor used to compute R v at line either 14 or 20 of Algorithm 2. Then |v(i)| ≤ 2|v|/3, for all i.Moreover, for any factor w constructed by Algorithm 2 it holds that Depth(w) = O(log |w|).

Figure 2
Figure 2 Slicing of a word u ∈ Λ.
): (1) Sample a factor u[i, i + k] of u with probability |u(i)|/|u|.(2) If u[i, i + k] contains at least one push symbol, let u[j, j ] be the matching pop sequence of u[i, i + k],extended by the first k neutral symbols after the last pop symbol, if any.

Proposition 19 .
Given an (α, t)-suffix sampling D u on u of size s u and another one D v on v of size s v , there is an algorithm Concatenate(D u , D v ) computing an (α, t)-suffix sampling on the concatenated word u • v of size at most s u + s v in time O(s u ).Sketch for an unfinished peak Parameters: real ε > 0, integers T ≥ 1 and k ≥ 1.Data structure for a weighted word v ∈ Prefix(ΛQ) Weights of v and of its first letter v(1) Height of v(1) Boolean indicating whether v contains a pop symbol (1 + ε )suffix decomposition v 1 , . . ., v s of v encoded for l = 1, . . ., s by Estimates |v l |low and |v l |high of |v l | T independent samplings S v l on kfactors of v l // See details below with corresponding weights and heights Moreover, given an (α, t)-suffix sampling D u on u of size s u , there is an algorithm Simplify(D u ) computing an (α, t)-suffix sampling on u of size at most 2 log |u|/ log α in time O(s u ).