Local Decoding of Sequences and Alignment-Free Comparison

Abstract : Subword composition plays an important role in a lot of analyses of sequences. Here we define and study the "local decoding of order N of sequences," an alternative that avoids some drawbacks of "subwords of length N" approaches while keeping informations about environments of length N in the sequences ("decoding" is taken here in the sense of hidden Markov modeling, i.e., associating some state to all positions of the sequence). We present an algorithm for computing the local decoding of order N of a given set of sequences. Its complexity is linear in the total length of the set (whatever the order N) both in time and memory space. In order to show a use of local decoding, we propose a very basic dissimilarity measure between sequences which can be computed both from local decoding of order N and composition in subwords of length N. The accuracies of these two dissimilarities are evaluated, over several datasets, by computing their linear correlations with a reference alignment-based distance. These accuracies are also compared to the one obtained from another recent alignment-free comparison.
Document type :
Journal articles
Complete list of metadatas

https://hal.inria.fr/inria-00289089
Contributor : Maude Pupin <>
Submitted on : Thursday, June 19, 2008 - 3:09:22 PM
Last modification on : Thursday, February 21, 2019 - 10:52:49 AM

Identifiers

Collections

Citation

Gilles Didier, Ivan Laprevotte, Maude Pupin, Alain Hénaut. Local Decoding of Sequences and Alignment-Free Comparison. Journal of Computational Biology, Mary Ann Liebert, 2006, 13 (8), pp.1465-1476. ⟨10.1089/cmb.2006.13.1465⟩. ⟨inria-00289089⟩

Share

Metrics

Record views

244