A Probabilistic analysis of a string edit problem

Guy Louchard Wojciec Szpankowski 1
1 ALGO - Algorithms
Inria Paris-Rocquencourt
Abstract : We consider a string edit problem in a probabilistic framework. This problem is of considerable interest to many facets of science, most notably molecular biology and computer science. A string editing transformes one string into another by performing a series of weighted edit operations of overall maximum (minimum) cost. An edit operation can be the deletion of a symbol, the insertion of a symbol or the substitution of a symbol. We assume that these weights can be arbitrary distributed. We reduce the problem to finding an optimal path in a weighted grid graph and provide several results regarding a typical behavior of such a path. In particular, we observe that the optimal path (i.e., edit distance) is asymptotically almost surely (a.s) equal to an where a is a constant and n is the sum of lengths of both strings. We also obtain explicit bounds on the constant a. More importantly, we show that the edit distance is well concentrated around its average value. As a by-product of our results, we also present a precise estimate of the number of alignments between two strings. To prove these findings we use techniques of random walks, diffusion limiting processes, generating functions and the method of bounded difference.
Document type :
Reports
Complete list of metadatas

https://hal.inria.fr/inria-00074858
Contributor : Rapport de Recherche Inria <>
Submitted on : Wednesday, May 24, 2006 - 4:35:28 PM
Last modification on : Friday, May 25, 2018 - 12:02:02 PM
Long-term archiving on : Tuesday, April 12, 2011 - 7:46:49 PM

Identifiers

  • HAL Id : inria-00074858, version 1

Collections

Citation

Guy Louchard, Wojciec Szpankowski. A Probabilistic analysis of a string edit problem. [Research Report] RR-1814, INRIA. 1992. ⟨inria-00074858⟩

Share

Metrics

Record views

111

Files downloads

79