Autocorrelation on words and its applications. Analysis of suffix trees by string-ruler approach

Abstract : This paper studies in a probabilistic framework some topics concerning the way words (strings) can overlap, and relationship of this to the depth of a digital tree associated with this set of words. A word is defined as a (random) sequence of (possibly infinite) symbols over a finite alphabet. By depth of a word (or the associated suffix tree) we mean the average length of a string that can be recopied. In other words, the depth is a measure of compressions for words, and as such finds many applications in computer sciences and telecommunications, most notably in coding theory, theory of languages and design and analysis of algorithms. Our main finding shows that the depth of a word (suffix tree) is asymptotically equivalent to the depth of a set of independent words (i.e., independent tries). More precisely, let us consider the first n suffixes of a random word. Then, the depth of a suffix tree built from these suffixes is normally distributed with the mean 1/h1 . log n and the variance a . log n where h1 is entropy of the alphabet, and a is a parameter of the probabilistic model. Finally, we present some consequence of our findings for the design and analysis of algorithms on words. In particular, we consider compression schemes that compress repated patterns by sending pointers to them instead of the patterns themselves, and prove that such a compression is not efficient for random strings.
Type de document :
Rapport
RR-1106, INRIA. 1989
Liste complète des métadonnées

https://hal.inria.fr/inria-00075453
Contributeur : Rapport de Recherche Inria <>
Soumis le : mercredi 24 mai 2006 - 18:15:16
Dernière modification le : vendredi 16 septembre 2016 - 15:11:39
Document(s) archivé(s) le : mardi 12 avril 2011 - 23:10:05

Fichiers

Identifiants

  • HAL Id : inria-00075453, version 1

Collections

Citation

Philippe Jacquet, Wojciec Szpankowski. Autocorrelation on words and its applications. Analysis of suffix trees by string-ruler approach. RR-1106, INRIA. 1989. 〈inria-00075453〉

Partager

Métriques

Consultations de la notice

179

Téléchargements de fichiers

55