V. Alfred, M. J. Aho, and . Corasick, Efficient string matching: An aid to bibliographic search, Communications of the ACM, vol.18, issue.6, pp.333-340, 1975.

A. Apostolico, C. Guerra, and C. Pizzi, Alignment Free Sequence Similarity with Bounded Hamming Distance, 2014 Data Compression Conference
DOI : 10.1109/DCC.2014.57

L. Bartholdi, Functionally recursive groups, 2012.

F. Bassino, J. Clément, J. Fayolle, and P. Nicodème, Constructions for clumps statistics, Discrete Mathematics and Theoretical Computer Science, pp.179-194, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00452701

G. Battaglia, D. Cangelosi, R. Grossi, and N. Pisanti, Masking patterns in sequences: A new class of motif discovery with don???t cares, Theoretical Computer Science, vol.410, issue.43, pp.4104327-4340, 2009.
DOI : 10.1016/j.tcs.2009.07.014

G. Benson and D. Y. Mak, Exact Distribution of a Spaced Seed Statistic for DNA Homology Detection, Proceedings of the International Symposium on String Processing and Information Retrieval (SPIRE), pp.282-293978, 2008.
DOI : 10.1007/978-3-540-89097-3_27

M. Boden, M. Schöneich, S. Horwege, S. Lindner, C. Leimeister et al., Alignment-free sequence comparison with spaced k-mers, Proceedings of the German Conference on Bioinformatics (GCB), volume 34 of OpenAccess Series in Informatics (OASIcs), pp.24-34, 2013.

B. Brejová, D. G. Brown, and T. Vina?, Vector seeds: An extension to spaced seeds, Journal of Computer and System Sciences, vol.70, issue.3, pp.364-380, 2005.
DOI : 10.1016/j.jcss.2004.12.008

K. B?inda, Languages of lossless seeds, Proceedings of the International Conference on Automata and Formal Languages (AFL), pp.139-150, 2014.
DOI : 10.4204/EPTCS.151.9

J. Buhler, U. Keich, and Y. Sun, Designing seeds for similarity search in genomic DNA, Journal of Computer and System Sciences, vol.70, issue.3, pp.342-363, 2005.
DOI : 10.1016/j.jcss.2004.12.003

A. Gardner and . Bateman, Rfam 11.0: 10 years of RNA families, Nucleic Acids Research, vol.41, issue.D1, pp.226-232

S. Burkhardt and J. Kärkkäinen, Better Filtering with Gapped q-Grams, Fundamenta Informaticae, vol.56, issue.12, pp.51-70, 2002.
DOI : 10.1007/3-540-48194-X_6

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.13.5942

S. Burkhardt, A. Crauser, P. Ferragina, and H. Lenhof, Eric Rivals, and Martin Vingron. q-gram based database searching using a suffix array (QUASAR), Proceedings of the Annual International Conference on Research in Computational Molecular Biology (RECOMB), pp.77-83, 1999.

Y. Chen, T. Souaiaia, and T. Chen, PerM: efficient mapping of short sequencing reads with periodic full sensitive spaced seeds, Bioinformatics, vol.25, issue.19, pp.2514-2521, 2009.
DOI : 10.1093/bioinformatics/btp486

B. Chor, D. Horn, N. Goldman, Y. Levy, and T. Massingham, Genomic DNA k-mer spectra: Models and modalities, Genome Biology, vol.10

M. Comin and D. Verzotto, Alignment-free phylogeny of whole genomes using underlying subwords, Algorithms for Molecular Biology, vol.7, issue.1, pp.1748-7188
DOI : 10.1093/sysbio/45.3.323

M. Cs?-urös, Performing local similarity searches with variable length seeds, Proceedings of the 15th Annual Combinatorial Pattern Matching Symposium (CPM), pp.373-387, 2004.

M. David, M. Dzamba, D. Lister, L. Ilie, and M. Brudno, SHRiMP2: Sensitive yet Practical Short Read Mapping, Bioinformatics, vol.27, issue.7, pp.1011-1012, 2011.
DOI : 10.1093/bioinformatics/btr046

G. Didier, E. Corel, I. Laprevotte, A. Grossmann, and C. Landès-devauchelle, Variable length local decoding and alignment-free sequence comparison, Theoretical Computer Science, vol.462, pp.1-11
DOI : 10.1016/j.tcs.2012.08.005

URL : https://hal.archives-ouvertes.fr/hal-01258495

H. Q. Dong-do-duc, T. H. Dinh, K. Dang, X. H. Laukens, and . Hoang, AcoSeeD: An Ant Colony Optimization for Finding Optimal Spaced Seeds in Biological Sequence Search, Proceedings of the 8th International Conference on Swarm Intelligence (ANTS), pp.204-211
DOI : 10.1007/978-3-642-32650-9_19

C. Robert and . Edgar, Muscle, Nucleic Acids Research, vol.32, issue.5, pp.1792-1797, 2004.
DOI : 10.1007/978-1-349-13443-4_4

URL : https://hal.archives-ouvertes.fr/hal-00897814

L. Egidi and G. Manzini, Spaced Seeds Design Using Perfect Rulers, Fundamenta Informaticae, vol.38, issue.5, pp.187-203, 2014.
DOI : 10.1016/j.jcss.2007.10.001

L. Egidi and G. Manzini, Design and analysis of periodic multiple seeds, Theoretical Computer Science, vol.522, pp.62-76
DOI : 10.1016/j.tcs.2013.12.007

M. Farach-colton, G. M. Landau, S. C. Sahinalp, and D. Tsur, Optimal spaced seeds for faster approximate string matching, Journal of Computer and System Sciences, vol.73, issue.7, pp.1035-1044, 2007.
DOI : 10.1016/j.jcss.2007.03.007

C. Martin, L. Frith, and . Noé, Improved search heuristics find 20 000 new alignments between human and mouse genomes, Nucleic Acids Research, vol.42, issue.7

A. Gambin, M. L. Lawomir-lasota, M. Startek, L. Sykulski, G. Noé et al., Subset seed extension to Protein BLAST, Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms, pp.149-158, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00609791

M. Ghandi, D. Lee, M. Mohammad-noori, and M. A. Beer, Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features, PLoS Computational Biology, vol.36, issue.7, p.1003711, 2014.
DOI : 10.1371/journal.pcbi.1003711.s014

URL : http://doi.org/10.1371/journal.pcbi.1003711

M. Ghandi, M. Mohammad-noori, and M. A. Beer, Robust $$k$$ k -mer frequency estimation using gapped $$k$$ k -mers, Journal of Mathematical Biology, vol.434, issue.2, pp.469-500, 2014.
DOI : 10.1007/s00285-013-0705-3

URL : http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3895138

E. Giladi, J. Healy, G. Myers, C. Hart, P. Kapranov et al., Error Tolerant Indexing and Alignment of Short Reads with Covering Template Families, Journal of Computational Biology, vol.17, issue.10, pp.1397-1411, 2010.
DOI : 10.1089/cmb.2010.0005

M. Giraud, M. Salson, M. Duez, C. Villenet, S. Quief et al., Fast multiclonal clusterization of V(D)J recombinations from high-throughput sequencing, BMC Genomics, vol.15, issue.1, pp.2014-2024
DOI : 10.1016/j.mib.2004.06.008

URL : https://hal.archives-ouvertes.fr/hal-01009173

R. S. Harris, Improved pairwise alignment of genomic DNA, 2007.

B. Haubold, N. Pierstorff, F. Möller, and T. Wiehe, Genome comparison without alignment using shortest unique substrings, BMC Bioinformatics, vol.6, issue.123, pp.10-1186, 2005.

N. Homer, B. Merriman, and S. F. Nelson, BFAST: An Alignment Tool for Large Scale Genome Resequencing, PLoS ONE, vol.5, issue.11, p.7767, 2009.
DOI : 10.1371/journal.pone.0007767.s001

URL : http://doi.org/10.1371/journal.pone.0007767

J. Hopcroft, An n log n algorithm for minimizing the states in a finite automaton The Theory of Machines and Computation, pp.189-196, 1971.

S. Horwege, S. Lindner, M. Boden, K. Hatje, M. Kollmar et al., Spaced words and kmacs: fast alignment-free sequence comparison based on inexact word matches, Nucleic Acids Research, vol.42, issue.W1, pp.7-11, 2014.
DOI : 10.1093/nar/gku398

L. Huang, Dynamic programming algorithms in semiring and hypergraph frameworks, 2006.

L. Ilie, S. Ilie, and A. Mansouri-bigvand, SpEED: fast computation of sensitive spaced seeds, Bioinformatics, vol.27, issue.17, pp.272433-2434, 2011.
DOI : 10.1093/bioinformatics/btr368

L. Ilie, H. Mohamadi, G. B. Golding, and W. F. Smyth, BOND: Basic OligoNucleotide Design, BMC Bioinformatics, vol.14, issue.1, pp.10-1186, 2013.
DOI : 10.1137/0222058

URL : http://doi.org/10.1186/1471-2105-14-69

T. Joachims, Learning to Classify Text using Support Vector Machines, 2002.
DOI : 10.1007/978-1-4615-0907-3

U. Keich, M. Li, B. Ma, and J. Tromp, On spaced seeds for similarity search, Discrete Applied Mathematics, vol.138, issue.3, pp.253-263, 2004.
DOI : 10.1016/S0166-218X(03)00382-2

M. Szymon, R. Kie, K. Wan, P. Sato, M. C. Horton et al., Adaptive seeds tame genomic sequence comparison, Genome Research, vol.21, issue.3, pp.487-493, 2011.

R. Kuang, E. Ie, K. Wang, K. Wang, M. Siddiqi et al., PROFILE-BASED STRING KERNELS FOR REMOTE HOMOLOGY DETECTION AND MOTIF EXTRACTION, Journal of Bioinformatics and Computational Biology, vol.03, issue.03, pp.527-550, 2005.
DOI : 10.1142/S021972000500120X

G. Kucherov, L. Noé, and M. A. Roytberg, Multiseed Lossless Filtration, IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.2, issue.1, pp.51-61, 2005.
DOI : 10.1109/TCBB.2005.12

URL : https://hal.archives-ouvertes.fr/inria-00354810

G. Kucherov, L. Noé, and M. A. Roytberg, A UNIFYING FRAMEWORK FOR SEED SENSITIVITY AND ITS APPLICATION TO SUBSET SEEDS, Journal of Bioinformatics and Computational Biology, vol.04, issue.02, pp.553-569, 2006.
DOI : 10.1142/S0219720006001977

URL : https://hal.archives-ouvertes.fr/hal-00018114

G. Kucherov, L. Noé, and M. A. Roytberg, Subset Seed Automaton, Proceedings of the 12th International Conference on Implementation and Application of Automata (CIAA), pp.180-191, 2007.
DOI : 10.1007/978-3-540-76336-9_18

URL : https://hal.archives-ouvertes.fr/inria-00170414

G. Kucherov, L. Noé, and M. A. Roytberg, Iedera subset seed design tool

C. Leimeister, M. Boden, S. Horwege, S. Lindner, and B. Morgenstern, Fast alignment-free sequence comparison using spaced-word frequencies, Bioinformatics, vol.30, issue.14, pp.1991-1999, 2014.
DOI : 10.1093/bioinformatics/btu177

C. S. Leslie, E. Eskin, and W. S. Noble, THE SPECTRUM KERNEL: A STRING KERNEL FOR SVM PROTEIN CLASSIFICATION, Biocomputing 2002, pp.564-575, 2002.
DOI : 10.1142/9789812799623_0053

C. S. Leslie, E. Eskin, A. Cohen, J. Weston, and W. S. Noble, Mismatch string kernels for discriminative protein classification, Bioinformatics, vol.20, issue.4, pp.467-476, 2004.
DOI : 10.1093/bioinformatics/btg431

M. Li, B. Ma, D. Kisman, and J. Tromp, PATTERNHUNTER II: HIGHLY SENSITIVE AND FAST HOMOLOGY SEARCH, Journal of Bioinformatics and Computational Biology, vol.02, issue.03, pp.417-439, 2004.
DOI : 10.1142/S0219720004000661

H. Lin, Z. Zhang, M. Q. Zhang, B. Ma, and M. Li, ZOOM! Zillions of oligos mapped, Bioinformatics, vol.24, issue.21, pp.2431-2437, 2008.
DOI : 10.1093/bioinformatics/btn416

Z. Liu, T. Z. Desantis, G. L. Andersen, and R. Knight, Accurate taxonomy assignments from 16S rRNA sequences produced by highly parallel pyrosequencers, Nucleic Acids Research, vol.36, issue.18, p.120, 2008.
DOI : 10.1093/nar/gkn491

H. Lodhi, C. Saunders, J. Shawe-taylor, N. Cristianini, and C. Watkins, Text classification using string kernels, Journal of Machine Learning Research, vol.2, pp.419-444, 2002.

T. Marschall, I. Herms, H. Kaltenbach, and S. Rahmann, Probabilistic Arithmetic Automata and Their Applications, IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.9, issue.6, pp.1737-1750, 2012.
DOI : 10.1109/TCBB.2012.109

E. K. Donald and . Martin, Coverage of spaced seeds as a measure of clumping, JSM Proceedings, Statistical Computing Section, 2013.

E. K. Donald, D. A. Martin, and . Coleman, Distribution of clump statistics for a collection of words, Journal of Applied Probability, vol.48, issue.4, pp.901-1204, 2011.

E. K. Donald, L. Martin, and . Noé, Faster exact probabilities for statistics of overlapping pattern occurrences, Annals of the Institute of Statistical Mathematics (AISM), 2014.

S. Maurer-stroh, V. Gunalan, W. Wong, and F. Eisenhaber, A SIMPLE SHORTCUT TO UNSUPERVISED ALIGNMENT-FREE PHYLOGENETIC GENOME GROUPINGS, EVEN FROM UNASSEMBLED SEQUENCING READS, Journal of Bioinformatics and Computational Biology, vol.11, issue.06, pp.1343005-1343015, 2013.
DOI : 10.1142/S0219720013430051

D. Charles, R. R. Michener, and . Sokal, A quantitative approach to a problem in classification, Evolution, vol.11, issue.2, pp.130-162, 1957.

M. Mohri, Handbook of Weighted Automata, chapter Weighted Automata Algorithms, pp.213-254, 2009.

F. Nicolas and . Rivals, Hardness of optimal spaced seed design, Journal of Computer and System Sciences, vol.74, issue.5, pp.831-849, 2008.
DOI : 10.1016/j.jcss.2007.10.001

URL : https://hal.archives-ouvertes.fr/lirmm-00106448

G. Nuel, Pattern Markov Chains: Optimal Markov Chain Embedding Through Deterministic Finite Automata, Journal of Applied Probability, vol.1, issue.01, pp.226-243, 2008.
DOI : 10.1214/aoap/1034801248

URL : https://hal.archives-ouvertes.fr/hal-00271298

G. Nuel, Bioinformatics -Trends and Methodologies, chapter Significance Score of Motifs in Biological Sequences

T. Onodera and T. Shibuya, The Gapped Spectrum Kernel for Support Vector Machines, Proceedings of the International Conference on Machine Learning and Data Mining in Pattern Recognition (MLDM), pp.1-15978, 2013.
DOI : 10.1007/978-3-642-39712-7_1

J. Pin, Tropical semirings, Publ. Newton Inst, vol.11, pp.50-69, 1998.
DOI : 10.1017/CBO9780511662508.004

URL : https://hal.archives-ouvertes.fr/hal-00113779

J. Qi, H. Luo, and B. Hao, CVTree: a phylogenetic tree reconstruction tool based on whole genomes, Nucleic Acids Research, vol.32, issue.Web Server, pp.45-47, 2004.
DOI : 10.1093/nar/gkh362

K. R. Rasmussen, J. Stoye, and E. W. Myers, Efficient q-Gram Filters for Finding All ??-Matches over a Given Length, Journal of Computational Biology, vol.13, issue.2, pp.296-308, 2006.
DOI : 10.1007/11415770_15

M. Régnier, B. Fang, and D. Iakovishina, Clump combinatorics, automata, and word asymp- totics, Proceedings of the Workshop on Analytic Algorithmics and Combinatorics (ANALCO), 2014.

H. Saigo, J. Vert, N. Ueda, and T. Akutsu, Protein homology detection using string alignment kernels, Bioinformatics, vol.20, issue.11, pp.1682-1689, 2004.
DOI : 10.1093/bioinformatics/bth141

URL : https://hal.archives-ouvertes.fr/hal-00433587

C. Schensted, Longest increasing and decreasing subsequences, Journal canadien de math??matiques, vol.13, issue.0, pp.179-191, 1961.
DOI : 10.4153/CJM-1961-015-3

I. Simon, Recognizable sets with multiplicities in the tropical semiring, In Mathematical foundations of computer science LNCS, vol.324, pp.107-120, 1988.
DOI : 10.1007/BFb0017135

E. Gregory, . Simsa, . Se-ran, G. A. Juna, S. Wua et al., Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions, Proceedings of the National Academy of Sciences, pp.2677-2682, 2009.

V. T. Stefanov, S. Robin, and S. Schbath, Waiting times for clumps of patterns and for structured motifs in random sequences, Discrete Applied Mathematics, vol.155, issue.6-7, pp.6-7868, 2007.
DOI : 10.1016/j.dam.2005.07.016

URL : https://hal.archives-ouvertes.fr/hal-01197504

K. Pooj, E. N. Stropea, and . Moriyama, Simple alignment-free methods for protein classification: A case study from G-protein-coupled receptors, Genomics, vol.89, issue.5, pp.602-612, 2007.

S. Vinga, Editorial: Alignment-free methods in computational biology, Briefings in Bioinformatics, vol.15, issue.3, pp.341-342, 2014.
DOI : 10.1093/bib/bbu005

S. Vinga and J. Almeida, Alignment-free sequence comparison--a review, Bioinformatics, vol.19, issue.4, pp.513-523, 2003.
DOI : 10.1093/bioinformatics/btg005

J. Yang and L. Zhang, Run Probabilities of Seed-Like Patterns and Identifying Good Transition Seeds, Journal of Computational Biology, vol.15, issue.10, pp.1295-1313, 2008.
DOI : 10.1089/cmb.2007.0209

L. Zhou, I. Mihai, and L. Florea, Spaced seeds for cross-species cDNA-to-genome sequence alignment, Communications in Information and Systems, vol.10, issue.2, pp.115-136, 2010.