Linear-size suffix tries

Maxime Crochemore; Chiara Epifanio; Roberto Grossi; Filippo Mignosi

doi:10.1016/j.tcs.2016.04.002

Article Dans Une Revue Theoretical Computer Science Année : 2016

Linear-size suffix tries

(1, 2) , (3) , (4, 5) , (6)

1
2
3
4
5
6

Maxime Crochemore

Fonction : Auteur
PersonId : 5397
IdHAL : maximecrochemore
ORCID : 0000-0003-1087-1419
IdRef : 034037357

Laboratoire d'Informatique Gaspard-Monge

Department of Computer Science [London]

Chiara Epifanio

Fonction : Auteur

Dipartimento di Matematica e Applicazioni [Palermo]

Roberto Grossi

Fonction : Auteur

Dipartimento di Informatica [Pisa]

Equipe de recherche européenne en algorithmique et biologie formelle et expérimentale

Filippo Mignosi

Fonction : Auteur

Università degli Studi dell'Aquila = University of L'Aquila

Résumé

Suffix trees are highly regarded data structures for text indexing and string algorithms [MCreight 76, Weiner 73]. For any given string w of length n=|w|, a suffix tree for w takes O(n) nodes and links. It is often presented as a compacted version of a suffix trie for w, where the latter is the trie (or digital search tree) built on the suffixes of w. Here the compaction process replaces each maximal chain of unary nodes with a single arc. For this, the suffix tree requires that the labels of its arcs are substrings encoded as pointers to w (or equivalent information). On the contrary, the arcs of the suffix trie are labeled by single symbols but there can be Θ(n²) nodes and links for suffix tries in the worst case because of their unary nodes. It is an interesting question if the suffix trie can be stored using O(n) nodes. We present the linear-size suffix trie, which guarantees O(n) nodes. We use a new technique for reducing the number of unary nodes to O(n), that stems from some results on antidictionaries. For instance, by using the linear-size suffix trie, we are able to check whether a pattern p of length m=|p| occurs in w in O(m log⁡|Σ|) time and we can find the longest common substring of two strings w1 and w2 in O((|w1|+|w2|) log⁡|Σ|) time for an alphabet Σ.

Mots clés

Factor and suffix automata Pattern matching Suffix tree Text indexing

Domaines

Algorithme et structure de données [cs.DS]

Fichier principal

LinearSizeSuffixTrie4.pdf (187.62 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Marie-France Sagot : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01388452

Soumis le : mardi 30 mai 2017-13:45:26

Dernière modification le : samedi 27 avril 2024-03:13:17

Archivage à long terme le : mercredi 6 septembre 2017-13:27:50

Dates et versions

hal-01388452 , version 1 (30-05-2017)

Identifiants

HAL Id : hal-01388452 , version 1
DOI : 10.1016/j.tcs.2016.04.002

Citer

Maxime Crochemore, Chiara Epifanio, Roberto Grossi, Filippo Mignosi. Linear-size suffix tries. Theoretical Computer Science, 2016, 638, pp.171 - 178. ⟨10.1016/j.tcs.2016.04.002⟩. ⟨hal-01388452⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENPC CNRS INRIA INSMI PARISTECH LIGM LIGM_MOA INRIA2 ESIEE-PARIS UNIV-EIFFEL LIGM_ADA JSE2024

285 Consultations

449 Téléchargements

Linear-size suffix tries

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager