Local Decoding of Sequences and Alignment-Free Comparison

Abstract : Subword composition plays an important role in a lot of analyses of sequences. Here we define and study the "local decoding of order N of sequences," an alternative that avoids some drawbacks of "subwords of length N" approaches while keeping informations about environments of length N in the sequences ("decoding" is taken here in the sense of hidden Markov modeling, i.e., associating some state to all positions of the sequence). We present an algorithm for computing the local decoding of order N of a given set of sequences. Its complexity is linear in the total length of the set (whatever the order N) both in time and memory space. In order to show a use of local decoding, we propose a very basic dissimilarity measure between sequences which can be computed both from local decoding of order N and composition in subwords of length N. The accuracies of these two dissimilarities are evaluated, over several datasets, by computing their linear correlations with a reference alignment-based distance. These accuracies are also compared to the one obtained from another recent alignment-free comparison.
Type de document :
Article dans une revue
Journal of Computational Biology, Mary Ann Liebert, 2006, 13 (8), pp.1465-1476. 〈10.1089/cmb.2006.13.1465〉
Liste complète des métadonnées

https://hal.inria.fr/inria-00289089
Contributeur : Maude Pupin <>
Soumis le : jeudi 19 juin 2008 - 15:09:22
Dernière modification le : mercredi 10 octobre 2018 - 10:50:53

Identifiants

Collections

Citation

Gilles Didier, Ivan Laprevotte, Maude Pupin, Alain Hénaut. Local Decoding of Sequences and Alignment-Free Comparison. Journal of Computational Biology, Mary Ann Liebert, 2006, 13 (8), pp.1465-1476. 〈10.1089/cmb.2006.13.1465〉. 〈inria-00289089〉

Partager

Métriques

Consultations de la notice

189