Statistical Properties of Similarity Score Functions

Jérémie Bourdon; Alban Mancheron

doi:10.46298/dmtcs.3502

Communication Dans Un Congrès Discrete Mathematics and Theoretical Computer Science Année : 2006

Statistical Properties of Similarity Score Functions

(1) , (1)

Jérémie Bourdon

Fonction : Auteur
PersonId : 14410
IdHAL : jeremie-bourdon
ORCID : 0000-0001-6674-8626
IdRef : 070210640

Laboratoire d'Informatique de Nantes Atlantique

Alban Mancheron

Fonction : Auteur
PersonId : 6019
IdHAL : alban-mancheron
ORCID : 0000-0001-9249-7592
IdRef : 111581362

Laboratoire d'Informatique de Nantes Atlantique

Résumé

In computational biology, a large amount of problems, such as pattern discovery, deals with the comparison of several sequences (of nucleotides, proteins or genes for instance). Very often, algorithms that address this problem use score functions that reflect a notion of similarity between the sequences. The most efficient methods take benefit from theoretical knowledge of the classical behavior of these score functions such as their mean, their variance, and sometime their asymptotic distribution in a given probabilistic model. In this paper, we study a recent family of score functions introduced in Mancheron 2003, which allows to compare two words having the same length. Here, the similarity takes into account all matches and mismatches between two sequences and not only the longest common subsequence as in the case of classical algorithms such as BLAST or FASTA. Based on generating functions, we provide closed formulas for the mean and the variance of these functions in an independent probabilistic model. Finally, we prove that every function in this family asymptotically behaves as a Gaussian random variable.

Mots clés

average-case analysis score functions sequence comparison

Domaines

Algorithme et structure de données [cs.DS] Mathématique discrète [cs.DM] Combinatoire [math.CO]

Fichier principal

dmAG0106.pdf (587.2 Ko)

Origine : Fichiers éditeurs autorisés sur une archive ouverte

Coordination Episciences Iam : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01184706

Soumis le : lundi 17 août 2015-14:24:44

Dernière modification le : jeudi 28 mars 2024-13:09:58

Archivage à long terme le : mercredi 18 novembre 2015-12:09:43

Dates et versions

hal-01184706 , version 1 (17-08-2015)

Identifiants

HAL Id : hal-01184706 , version 1
DOI : 10.46298/dmtcs.3502

Citer

Jérémie Bourdon, Alban Mancheron. Statistical Properties of Similarity Score Functions. Fourth Colloquium on Mathematics and Computer Science Algorithms, Trees, Combinatorics and Probabilities, 2006, Nancy, France. pp.129-140, ⟨10.46298/dmtcs.3502⟩. ⟨hal-01184706⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-NANTES CNRS LINA LINA-COMBI TDS-MACS LS2N NANTES-UNIVERSITE

276 Consultations

586 Téléchargements

Statistical Properties of Similarity Score Functions

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager