# An Algorithm for Estimating all Matches Between Two Strings

1 ALGO - Algorithms
Inria Paris-Rocquencourt
Abstract : We give a randomized algorithm for estimating the score vector of matches between a text string of length~$N$ and a pattern string of length~$M$; this is the vector obtained when the pattern is slid along the text, and the number of matches is counted for each position. The randomized algorithm takes deterministic time $O( (N / M ) {\textstyle {\it Conv}} (M) )$ where ${\textstyle {\it Conv}} (M)$ is the time for performing a convolution of two vectors of size $M$ each. The algorithm finds an unbiased estimator of the scores, whose variance is particularly small for scores that are close to $M$, i.e., for approximate occurrences of the pattern in the text. No assumptions are made about the probabilistic characteristics of the input, or about the number of different symbols appearing in $T$ or $P$ (i.e., the alphabet size need not be much smaller than $M$). The solution extends to the weighted case and to higher dimensions.
Keywords :
Type de document :
Rapport
[Research Report] RR-3194, INRIA. 1997
Domaine :
Liste complète des métadonnées

https://hal.inria.fr/inria-00073495
Contributeur : Rapport de Recherche Inria <>
Soumis le : mercredi 24 mai 2006 - 13:02:47
Dernière modification le : mardi 17 avril 2018 - 11:25:22
Document(s) archivé(s) le : jeudi 24 mars 2011 - 12:52:01

### Identifiants

• HAL Id : inria-00073495, version 1

### Citation

Mikhail J. Atallah, Frédéric Chyzak, Philippe Dumas. An Algorithm for Estimating all Matches Between Two Strings. [Research Report] RR-3194, INRIA. 1997. 〈inria-00073495〉

### Métriques

Consultations de la notice

## 108

Téléchargements de fichiers