Minimized Compact Automaton for Clumps over Degenerate Patterns - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2019

Minimized Compact Automaton for Clumps over Degenerate Patterns

Résumé

Clumps are sequences of overlapping occurrences of a given pattern that play a vital role in the study of distribution of pattern occurrences. These distributions are sused for finding functional fragments in biological sequences. In this paper we present a minimized compacted automaton (Overlap walking automaton, OWA) recognizing all the possible clumps for degenerate patterns and its usage for computation of probabilities of sets of clumps. We also present Aho-Corasick like automaton, RMinPatAut, recognizing all the sequences ending with pattern occurrences. The states of RMinPatAut are equivalence classes on the prefixes of the pattern words. We have proved that RMinPatAut is Nerode-minimal, i.e., minimal in classical sense. We use RMinPatAut as an auxiliary structure for OWA construction. For degenerate pattern, RMinPatAut can be constructed in linear time on the number of its states (it is bounded by 2m, where m the length of pattern words). OWA can be constructed in linear time on the sum of its size and RMinPatAut size.
Fichier principal
Vignette du fichier
OWAv5 (3).pdf (449.92 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01940837 , version 1 (30-11-2018)
hal-01940837 , version 2 (24-01-2019)

Identifiants

  • HAL Id : hal-01940837 , version 2

Citer

Eugenia Furletova, Jan Holub, Mireille Regnier. Minimized Compact Automaton for Clumps over Degenerate Patterns. 2019. ⟨hal-01940837v2⟩
129 Consultations
113 Téléchargements

Partager

Gmail Facebook X LinkedIn More