An Algorithm for Estimating all Matches Between Two Strings

Mikhail J. Atallah; Frédéric Chyzak; Philippe Dumas

Rapport (Rapport De Recherche) Année : 1997

An Algorithm for Estimating all Matches Between Two Strings

, (1) , (1)

Mikhail J. Atallah

Fonction : Auteur

Frédéric Chyzak

Fonction : Auteur
PersonId : 1159
IdHAL : frederic-chyzak
ORCID : 0000-0003-3114-1191
IdRef : 15406257X

Algorithms

Philippe Dumas

Fonction : Auteur
PersonId : 735360
IdHAL : philippe-r-a-dumas
ORCID : 0000-0001-9360-3844

Algorithms

Résumé

We give a randomized algorithm for estimating the score vector of matches between a text string of length~$N$ and a pattern string of length~$M$; this is the vector obtained when the pattern is slid along the text, and the number of matches is counted for each position. The randomized algorithm takes deterministic time $O( (N / M ) {\textstyle {\it Conv}} (M) )$ where ${\textstyle {\it Conv}} (M)$ is the time for performing a convolution of two vectors of size $M$ each. The algorithm finds an unbiased estimator of the scores, whose variance is particularly small for scores that are close to $M$, i.e., for approximate occurrences of the pattern in the text. No assumptions are made about the probabilistic characteristics of the input, or about the number of different symbols appearing in $T$ or $P$ (i.e., the alphabet size need not be much smaller than $M$). The solution extends to the weighted case and to higher dimensions.

Mots clés

ALGORITHMS CONVOLUTION PATTERN MATCHING

Domaines

Autre [cs.OH]

Fichier principal

RR-3194.pdf (372.31 Ko)

Rapport De Recherche Inria : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00073495

Soumis le : mercredi 24 mai 2006-13:02:47

Dernière modification le : mardi 7 février 2023-03:39:59

Archivage à long terme le : jeudi 24 mars 2011-12:52:01

Dates et versions

inria-00073495 , version 1 (24-05-2006)

Identifiants

HAL Id : inria-00073495 , version 1

Citer

Mikhail J. Atallah, Frédéric Chyzak, Philippe Dumas. An Algorithm for Estimating all Matches Between Two Strings. [Research Report] RR-3194, INRIA. 1997. ⟨inria-00073495⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INRIA INRIA-RRRT INRIA2 LARA

70 Consultations

54 Téléchargements

An Algorithm for Estimating all Matches Between Two Strings

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager