Skip to Main content Skip to Navigation
Preprints, Working Papers, ...

Minimized Compact Automaton for Clumps over Degenerate Patterns

Abstract : Clumps are sequences of overlapping occurrences of a given pattern that play a vital role in the study of distribution of pattern occurrences. These distributions are sused for finding functional fragments in biological sequences. In this paper we present a minimized compacted automaton (Overlap walking automaton, OWA) recognizing all the possible clumps for degenerate patterns and its usage for computation of probabilities of sets of clumps. We also present Aho-Corasick like automaton, RMinPatAut, recognizing all the sequences ending with pattern occurrences. The states of RMinPatAut are equivalence classes on the prefixes of the pattern words. We have proved that RMinPatAut is Nerode-minimal, i.e., minimal in classical sense. We use RMinPatAut as an auxiliary structure for OWA construction. For degenerate pattern, RMinPatAut can be constructed in linear time on the number of its states (it is bounded by 2m, where m the length of pattern words). OWA can be constructed in linear time on the sum of its size and RMinPatAut size.
Complete list of metadata

Cited literature [20 references]  Display  Hide  Download

https://hal.inria.fr/hal-01940837
Contributor : Mireille Regnier <>
Submitted on : Thursday, January 24, 2019 - 6:56:04 PM
Last modification on : Friday, April 30, 2021 - 9:55:57 AM

File

OWAv5 (3).pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01940837, version 2

Collections

Citation

Eugenia Furletova, Jan Holub, Mireille Regnier. Minimized Compact Automaton for Clumps over Degenerate Patterns. 2019. ⟨hal-01940837v2⟩

Share

Metrics

Record views

122

Files downloads

229