Local Decoding of Sequences and Alignment-Free Comparison

Gilles Didier; Ivan Laprevotte; Maude Pupin; Alain Hénaut

doi:10.1089/cmb.2006.13.1465

Article Dans Une Revue Journal of Computational Biology Année : 2006

Local Decoding of Sequences and Alignment-Free Comparison

(1) , (2) , (3, 4) , (5)

1
2
3
4
5

Gilles Didier

Fonction : Auteur
PersonId : 16844
IdHAL : gilles-didier
ORCID : 0000-0003-0596-9112

Institut de mathématiques de Luminy

Ivan Laprevotte

Fonction : Auteur
PersonId : 849812

Laboratoire Statistique et Génome

Maude Pupin

Fonction : Auteur
PersonId : 6668
IdHAL : maude-pupin
ORCID : 0000-0003-3197-0715
IdRef : 165990805

Laboratoire d'Informatique Fondamentale de Lille

Sequential Learning

Alain Hénaut

Fonction : Auteur

Science et Décision

Résumé

Subword composition plays an important role in a lot of analyses of sequences. Here we define and study the "local decoding of order N of sequences," an alternative that avoids some drawbacks of "subwords of length N" approaches while keeping informations about environments of length N in the sequences ("decoding" is taken here in the sense of hidden Markov modeling, i.e., associating some state to all positions of the sequence). We present an algorithm for computing the local decoding of order N of a given set of sequences. Its complexity is linear in the total length of the set (whatever the order N) both in time and memory space. In order to show a use of local decoding, we propose a very basic dissimilarity measure between sequences which can be computed both from local decoding of order N and composition in subwords of length N. The accuracies of these two dissimilarities are evaluated, over several datasets, by computing their linear correlations with a reference alignment-based distance. These accuracies are also compared to the one obtained from another recent alignment-free comparison.

Mots clés

algorithm HMM decoding sequences comparison

Domaines

Bio-informatique [q-bio.QM]

Maude Pupin : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00289089

Soumis le : jeudi 19 juin 2008-15:09:22

Dernière modification le : mardi 12 mars 2024-10:43:31

Dates et versions

inria-00289089 , version 1 (19-06-2008)

Identifiants

HAL Id : inria-00289089 , version 1
DOI : 10.1089/cmb.2006.13.1465
PRODINRA : 250685

Citer

Gilles Didier, Ivan Laprevotte, Maude Pupin, Alain Hénaut. Local Decoding of Sequences and Alignment-Free Comparison. Journal of Computational Biology, 2006, 13 (8), pp.1465-1476. ⟨10.1089/cmb.2006.13.1465⟩. ⟨inria-00289089⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-LILLE3 CNRS INRIA UNIV-AMU UNIV-EVRY INRA LIFL IML I2M INRIA2 LAMME INRAE MATHNUM

106 Consultations

0 Téléchargements

Local Decoding of Sequences and Alignment-Free Comparison

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager