Skip to Main content Skip to Navigation
Reports

Autocorrelation on words and its applications. Analysis of suffix trees by string-ruler approach

Abstract : This paper studies in a probabilistic framework some topics concerning the way words (strings) can overlap, and relationship of this to the depth of a digital tree associated with this set of words. A word is defined as a (random) sequence of (possibly infinite) symbols over a finite alphabet. By depth of a word (or the associated suffix tree) we mean the average length of a string that can be recopied. In other words, the depth is a measure of compressions for words, and as such finds many applications in computer sciences and telecommunications, most notably in coding theory, theory of languages and design and analysis of algorithms. Our main finding shows that the depth of a word (suffix tree) is asymptotically equivalent to the depth of a set of independent words (i.e., independent tries). More precisely, let us consider the first n suffixes of a random word. Then, the depth of a suffix tree built from these suffixes is normally distributed with the mean 1/h1 . log n and the variance a . log n where h1 is entropy of the alphabet, and a is a parameter of the probabilistic model. Finally, we present some consequence of our findings for the design and analysis of algorithms on words. In particular, we consider compression schemes that compress repated patterns by sending pointers to them instead of the patterns themselves, and prove that such a compression is not efficient for random strings.
Document type :
Reports
Complete list of metadata

https://hal.inria.fr/inria-00075453
Contributor : Rapport de Recherche Inria <>
Submitted on : Wednesday, May 24, 2006 - 6:15:16 PM
Last modification on : Thursday, February 11, 2021 - 2:50:07 PM
Long-term archiving on: : Tuesday, April 12, 2011 - 11:10:05 PM

Identifiers

  • HAL Id : inria-00075453, version 1

Collections

Citation

Philippe Jacquet, Wojciec Szpankowski. Autocorrelation on words and its applications. Analysis of suffix trees by string-ruler approach. RR-1106, INRIA. 1989. ⟨inria-00075453⟩

Share

Metrics

Record views

217

Files downloads

95