Skip to Main content Skip to Navigation
Journal articles

A Coverage Criterion for Spaced Seeds and Its Applications to Support Vector Machine String Kernels and $k$-Mer Distances

Laurent Noé 1 Donald Martin 2
1 BONSAI - Bioinformatics and Sequence Analysis
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe
Abstract : Spaced seeds have been recently shown to not only detect more alignments, but also to give a more accurate measure of phylogenetic distances (Boden et al., 2013, Horwege et al., 2014, Leimeister et al., 2014), and to provide a lower misclassification rate when used with Support Vector Machines (SVMs) (On-odera and Shibuya, 2013), We confirm by independent experiments these two results, and propose in this article to use a coverage criterion (Benson and Mak, 2008, Martin, 2013, Martin and Noé, 2014), to measure the seed efficiency in both cases in order to design better seed patterns. We show first how this coverage criterion can be directly measured by a full automaton-based approach. We then illustrate how this criterion performs when compared with two other criteria frequently used, namely the single-hit and multiple-hit criteria, through correlation coefficients with the correct classification/the true distance. At the end, for alignment-free distances, we propose an extension by adopting the coverage criterion, show how it performs, and indicate how it can be efficiently computed.
Complete list of metadata

Cited literature [78 references]  Display  Hide  Download

https://hal.inria.fr/hal-01083204
Contributor : Laurent Noé Connect in order to contact the contributor
Submitted on : Monday, November 24, 2014 - 3:01:58 PM
Last modification on : Tuesday, October 19, 2021 - 12:52:58 PM
Long-term archiving on: : Wednesday, February 25, 2015 - 10:05:58 AM

Files

coverage-sensitivity.pdf
Files produced by the author(s)

Identifiers

Citation

Laurent Noé, Donald Martin. A Coverage Criterion for Spaced Seeds and Its Applications to Support Vector Machine String Kernels and $k$-Mer Distances. Journal of Computational Biology, Mary Ann Liebert, 2014, 21 (12), pp.28. ⟨10.1089/cmb.2014.0173⟩. ⟨hal-01083204⟩

Share

Metrics

Les métriques sont temporairement indisponibles