Skip to Main content Skip to Navigation

Autocorrelation on words and its applications. Analysis of suffix trees by string-ruler approach

Abstract : This paper studies in a probabilistic framework some topics concerning the way words (strings) can overlap, and relationship of this to the depth of a digital tree associated with this set of words. A word is defined as a (random) sequence of (possibly infinite) symbols over a finite alphabet. By depth of a word (or the associated suffix tree) we mean the average length of a string that can be recopied. In other words, the depth is a measure of compressions for words, and as such finds many applications in computer sciences and telecommunications, most notably in coding theory, theory of languages and design and analysis of algorithms. Our main finding shows that the depth of a word (suffix tree) is asymptotically equivalent to the depth of a set of independent words (i.e., independent tries). More precisely, let us consider the first n suffixes of a random word. Then, the depth of a suffix tree built from these suffixes is normally distributed with the mean 1/h1 . log n and the variance a . log n where h1 is entropy of the alphabet, and a is a parameter of the probabilistic model. Finally, we present some consequence of our findings for the design and analysis of algorithms on words. In particular, we consider compression schemes that compress repated patterns by sending pointers to them instead of the patterns themselves, and prove that such a compression is not efficient for random strings.
Document type :
Complete list of metadata
Contributor : Rapport de Recherche Inria Connect in order to contact the contributor
Submitted on : Wednesday, May 24, 2006 - 6:15:16 PM
Last modification on : Friday, February 4, 2022 - 3:17:41 AM
Long-term archiving on: : Tuesday, April 12, 2011 - 11:10:05 PM


  • HAL Id : inria-00075453, version 1



Philippe Jacquet, Wojciec Szpankowski. Autocorrelation on words and its applications. Analysis of suffix trees by string-ruler approach. RR-1106, INRIA. 1989. ⟨inria-00075453⟩



Record views


Files downloads