HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation

An Algorithm for Estimating all Matches Between Two Strings

Abstract : We give a randomized algorithm for estimating the score vector of matches between a text string of length~$N$ and a pattern string of length~$M$; this is the vector obtained when the pattern is slid along the text, and the number of matches is counted for each position. The randomized algorithm takes deterministic time $O( (N / M ) {\textstyle {\it Conv}} (M) )$ where ${\textstyle {\it Conv}} (M)$ is the time for performing a convolution of two vectors of size $M$ each. The algorithm finds an unbiased estimator of the scores, whose variance is particularly small for scores that are close to $M$, i.e., for approximate occurrences of the pattern in the text. No assumptions are made about the probabilistic characteristics of the input, or about the number of different symbols appearing in $T$ or $P$ (i.e., the alphabet size need not be much smaller than $M$). The solution extends to the weighted case and to higher dimensions.
Document type :
Complete list of metadata

Contributor : Rapport de Recherche Inria Connect in order to contact the contributor
Submitted on : Wednesday, May 24, 2006 - 1:02:47 PM
Last modification on : Friday, February 4, 2022 - 3:10:20 AM
Long-term archiving on: : Thursday, March 24, 2011 - 12:52:01 PM


  • HAL Id : inria-00073495, version 1



Mikhail J. Atallah, Frédéric Chyzak, Philippe Dumas. An Algorithm for Estimating all Matches Between Two Strings. [Research Report] RR-3194, INRIA. 1997. ⟨inria-00073495⟩



Record views


Files downloads