G. W. Beadle and M. Beadle, The language of life: an introduction to the science of genetics, 1966.

S. Clancy and W. Brown, Translation: DNA to mRNA to protein, Nature Education, 2008.

D. B. Searls, The computational linguistics of biological sequences, Artificial Intelligence and Molecular Biology, pp.47-120, 1993.

D. B. Searls, Linguistic approaches to biological sequences, Bioinformatics, vol.13, issue.4, pp.333-344, 1997.
DOI : 10.1093/bioinformatics/13.4.333

D. B. Searls, The language of genes, Nature, vol.10, issue.6912, pp.211-217, 2002.
DOI : 10.1038/29667

D. Chiang, A. K. Joshi, and D. B. Searls, Grammatical Representations of Macromolecular Structure, Journal of Computational Biology, vol.13, issue.5, pp.1077-1100, 2006.
DOI : 10.1089/cmb.2006.13.1077

D. B. Searls, A primer in macromolecular linguistics, Biopolymers, vol.366, issue.3, pp.203-220, 2013.
DOI : 10.1098/rstb.2010.0378

A. K. Joshi, D. J. Weir, and K. Vijay-shanker, The convergence of mildly context-sensitive grammar formalisms, 1990.

S. Dong and D. B. Searls, Gene Structure Prediction by Linguistic Methods, Genomics, vol.23, issue.3, pp.540-551, 1994.
DOI : 10.1006/geno.1994.1541

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.42.2945

J. Nicolas, P. Durand, G. Ranchy, S. Tempel, and A. Valin, Suffix-tree analyser (STAN): looking for nucleotidic and peptidic patterns in chromosomes, Bioinformatics, vol.21, issue.24, pp.4408-4410, 2005.
DOI : 10.1093/bioinformatics/bti710

M. Dsouza, N. Larsen, and R. Overbeek, Searching for patterns in genomic data, Trends in Genetics, vol.13, issue.12, pp.497-498, 1997.
DOI : 10.1016/S0168-9525(97)01347-4

G. Pesole, S. Liuni, and M. Souza, PatSearch: a pattern matcher software that finds functional elements in nucleotide and protein sequences and assesses their statistical significance, Bioinformatics, vol.16, issue.5, pp.439-450, 2000.
DOI : 10.1093/bioinformatics/16.5.439

C. Belleannée, O. Sallou, and J. Nicolas, Logol: Expressive Pattern Matching in Sequences. Application to Ribosomal Frameshift Modeling, Pattern Recognition in Bioinformatics: 9th IAPR International Conference, pp.34-47, 2014.
DOI : 10.1007/978-3-319-09192-1_4

T. J. Macke, D. J. Ecker, R. R. Gutell, D. Gautheret, D. A. Case et al., RNAMotif, an RNA secondary structure definition and search algorithm, Nucleic Acids Research, vol.29, issue.22, pp.4724-4735, 2001.
DOI : 10.1093/nar/29.22.4724

URL : https://www.ncbi.nlm.nih.gov/pmc/articles/PMC92549/pdf

S. Eddy, RNABOB: a program to search for RNA secondary structure motifs in sequence databases, 1996.

S. Graf, D. Strothmann, S. Kurtz, and G. Steger, HyPaLib: a database of RNAs and RNA structural elements defined by hybrid patterns, Nucleic Acids Research, vol.29, issue.1, pp.196-198, 2001.
DOI : 10.1093/nar/29.1.196

D. Strothmann, S. A. Gräf, S. Kurtz, and G. Steger, The syntax and semantics of a language for describing complex patterns in biological sequences, 2000.

B. Billoud, M. Kontic, and A. Viari, Palingol: a declarative programming language to describe nucleic acids' secondary structures and to scan sequence database, Nucleic Acids Research, vol.24, issue.8, pp.395-403, 1996.
DOI : 10.1093/nar/24.8.1395

URL : https://www.ncbi.nlm.nih.gov/pmc/articles/PMC145829/pdf

F. Meyer, S. Kurtz, R. Backofen, S. Will, and M. Beckstette, Structator: fast index-based search for RNA sequence-structure patterns, BMC Bioinformatics, vol.12, issue.1, p.214, 2011.
DOI : 10.1093/bioinformatics/btp250

URL : http://doi.org/10.1186/1471-2105-12-214

D. Pribnow, Nucleotide sequence of an RNA polymerase binding site at an early T7 promoter., Proceedings of the National Academy of Sciences, vol.72, issue.3, pp.784-792, 1975.
DOI : 10.1073/pnas.72.3.784

J. Van-helden, The Analysis of Regulatory Sequences In: Multiple Aspects of DNA and RNA: from Biophysics to Bioinformatics: Lecture Notes of the Les Houches Summer School, 2004.

L. Parida, Pattern Discovery in Bioinformatics: Theory & Algorithms, 2007.
DOI : 10.1201/9781420010732

T. D. Schneider, G. D. Stormo, L. Gold, and A. Ehrenfeucht, Information content of binding sites on nucleotide sequences, Journal of Molecular Biology, vol.188, issue.3, pp.415-446, 1986.
DOI : 10.1016/0022-2836(86)90165-8

T. Schneider, Information theory primer, 1995.

G. E. Crooks, G. Hon, J. M. Chandonia, and S. E. Brenner, WebLogo: A Sequence Logo Generator, Genome Research, vol.14, issue.6, pp.1188-1190, 2004.
DOI : 10.1101/gr.849004

URL : http://genome.cshlp.org/content/14/6/1188.full.pdf

S. Kullback and R. A. Leibler, On Information and Sufficiency, The Annals of Mathematical Statistics, vol.22, issue.1, pp.79-86, 1951.
DOI : 10.1214/aoms/1177729694

G. Z. Hertz, G. Stormo, and G. D. , Identification of consensus patterns in unaligned DNA sequences known to be functionally related, Bioinformatics, vol.6, issue.2, pp.81-92, 1990.
DOI : 10.1093/bioinformatics/6.2.81

G. Z. Hertz and G. D. Stormo, Identifying DNA and protein patterns with statistically significant alignments of multiple sequences, Bioinformatics, vol.15, issue.7, pp.563-577, 1999.
DOI : 10.1093/bioinformatics/15.7.563

G. D. Stormo and G. , Identifying protein-binding sites from unaligned DNA fragments., Proceedings of the National Academy of Sciences, vol.86, issue.4, pp.1183-1187, 1989.
DOI : 10.1073/pnas.86.4.1183

URL : https://www.ncbi.nlm.nih.gov/pmc/articles/PMC286650/pdf

T. L. Bailey and C. Elkan, Fitting a mixture model by expectation maximization to discover motifs in biopolymers, Proc Int Conf Intell Syst Mol Biol, vol.2, pp.28-36, 1994.

C. E. Lawrence, S. F. Altschul, M. S. Boguski, J. S. Liu, A. F. Neuwald et al., Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment, Science, vol.262, issue.5131, pp.208-214, 1993.
DOI : 10.1126/science.8211139

A. F. Neuwald, J. S. Liu, and C. E. Lawrence, Gibbs motif sampling: Detection of bacterial outer membrane protein repeats, Protein Science, vol.17, issue.8, pp.1618-1632, 1995.
DOI : 10.1146/annurev.iy.06.040188.002121

URL : http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2143180

A. F. Neuwald, J. S. Liu, D. J. Lipman, and C. E. Lawrence, Extracting protein alignment models from the sequence database, Nucleic Acids Research, vol.25, issue.9, pp.1665-1677, 1997.
DOI : 10.1093/nar/25.9.1665

URL : https://www.ncbi.nlm.nih.gov/pmc/articles/PMC146639/pdf

F. P. Roth, J. D. Hughes, P. W. Estep, and G. M. Church, Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation, Nature Biotechnology, vol.15, issue.10, pp.939-945, 1998.
DOI : 10.1038/nbt0698-566

G. Thijs, M. Lescot, K. Marchal, S. Rombauts, B. De-moor et al., A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling, Bioinformatics, vol.17, issue.12, pp.1113-1122, 2001.
DOI : 10.1093/bioinformatics/17.12.1113

X. Liu, D. L. Brutlag, and J. S. Liu, BIOPROSPECTOR: DISCOVERING CONSERVED DNA MOTIFS IN UPSTREAM REGULATORY REGIONS OF CO-EXPRESSED GENES, Biocomputing 2001, pp.127-138, 2001.
DOI : 10.1142/9789814447362_0014

V. Matys, O. V. Kel-margoulis, E. Fricke, I. Liebich, S. Land et al., TRANSFAC(R) and its module TRANSCompel(R): transcriptional gene regulation in eukaryotes, Nucleic Acids Research, vol.34, issue.90001, pp.108-110, 2006.
DOI : 10.1093/nar/gkj143

URL : http://doi.org/10.1093/nar/gkj143

A. Sandelin, W. Alkema, P. Engström, W. W. Wasserman, and B. Lenhard, JASPAR: an open-access database for eukaryotic transcription factor binding profiles, Nucleic Acids Research, vol.32, issue.90001, pp.91-94, 2004.
DOI : 10.1093/nar/gkh012

URL : http://www.ncbi.nlm.nih.gov/pmc/articles/PMC308747

W. R. Taylor, The classification of amino acid conservation, Journal of Theoretical Biology, vol.119, issue.2, pp.205-218, 1986.
DOI : 10.1016/S0022-5193(86)80075-3

S. R. Eddy, Where did the BLOSUM62 alignment score matrix come from?, Nature Biotechnology, vol.219, issue.8, pp.1035-1036, 2004.
DOI : 10.1038/nbt0804-1035

S. B. Needleman and C. D. Wunsch, A general method applicable to the search for similarities in the amino acid sequence of two proteins, Journal of Molecular Biology, vol.48, issue.3, pp.443-453, 1970.
DOI : 10.1016/0022-2836(70)90057-4

T. Smith and M. Waterman, Identification of common molecular subsequences, Journal of Molecular Biology, vol.147, issue.1, pp.195-197, 1981.
DOI : 10.1016/0022-2836(81)90087-5

W. R. Pearson and D. J. Lipman, Improved tools for biological sequence comparison., Proceedings of the National Academy of Sciences, vol.85, issue.8, pp.2444-2448, 1988.
DOI : 10.1073/pnas.85.8.2444

URL : http://www.pnas.org/content/85/8/2444.full.pdf

S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, Basic local alignment search tool, Journal of Molecular Biology, vol.215, issue.3, pp.215-403, 1990.
DOI : 10.1016/S0022-2836(05)80360-2

J. D. Thompson, D. G. Higgins, and T. J. Gibson, CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice, Nucleic Acids Research, vol.22, issue.22, pp.4673-4680, 1994.
DOI : 10.1093/nar/22.22.4673

C. Notredame, D. G. Higgins, and J. Heringa, T-coffee: a novel method for fast and accurate multiple sequence alignment 1 1Edited by J. Thornton, Journal of Molecular Biology, vol.302, issue.1, pp.205-217, 2000.
DOI : 10.1006/jmbi.2000.4042

C. B. Do, M. S. Mahabhashyam, M. Brudno, and S. Batzoglou, ProbCons: Probabilistic consistency-based multiple sequence alignment, Genome Research, vol.15, issue.2, pp.330-340, 2005.
DOI : 10.1101/gr.2821705

URL : http://www.ncbi.nlm.nih.gov/pmc/articles/PMC546535

R. C. Edgar, MUSCLE: multiple sequence alignment with high accuracy and high throughput, Nucleic Acids Research, vol.32, issue.5, pp.1792-1797, 2004.
DOI : 10.1093/nar/gkh340

URL : http://www.ncbi.nlm.nih.gov/pmc/articles/PMC390337

K. Katoh, K. Misawa, K. I. Kuma, and T. Miyata, MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform, Nucleic Acids Research, vol.30, issue.14, pp.3059-3066, 2002.
DOI : 10.1093/nar/gkf436

URL : https://www.ncbi.nlm.nih.gov/pmc/articles/PMC135756/pdf

B. Morgenstern, K. Frech, A. Dress, and T. Werner, DIALIGN: finding local similarities by multiple sequence alignment, Bioinformatics, vol.14, issue.3, pp.290-294, 1998.
DOI : 10.1093/bioinformatics/14.3.290

B. Morgenstern, DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment, Bioinformatics, vol.15, issue.3, pp.211-218, 1999.
DOI : 10.1093/bioinformatics/15.3.211

S. R. Eddy, Profile hidden Markov models, Bioinformatics, vol.14, issue.9, pp.755-763, 1998.
DOI : 10.1093/bioinformatics/14.9.755

M. Gribskov, A. D. Mclachlan, and D. Eisenberg, Profile analysis: detection of distantly related proteins., Proceedings of the National Academy of Sciences, vol.84, issue.13, pp.4355-4363, 1987.
DOI : 10.1073/pnas.84.13.4355

A. Krogh, M. Brown, I. S. Mian, K. Sjölander, and D. Haussler, Hidden Markov Models in Computational Biology, Journal of Molecular Biology, vol.235, issue.5, pp.1501-1532, 1994.
DOI : 10.1006/jmbi.1994.1104

P. Baldi, Y. Chauvin, T. Hunkapiller, and M. A. Mcclure, Hidden Markov models of biological primary sequence information., Proceedings of the National Academy of Sciences, vol.91, issue.3, pp.1059-63, 1994.
DOI : 10.1073/pnas.91.3.1059

URL : http://www.pnas.org/content/91/3/1059.full.pdf

L. R. Rabiner, A tutorial on hidden markov models and selected applications in speech recognition, In: Proceedings of the IEEE, pp.257-286, 1989.

J. G. Henikoff and S. Henikoff, Using substitution probabilities to improve position-specific scoring matrices. Computer applications in the biosciences, CABIOS, vol.12, pp.135-178, 1996.
DOI : 10.1093/bioinformatics/12.2.135

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.109.6780

J. M. Claverie, Some useful statistical properties of position-weight matrices, Computers & Chemistry, vol.18, issue.3, pp.287-294, 1994.
DOI : 10.1016/0097-8485(94)85024-0

K. Sjölander, K. Karplus, M. Brown, R. Hughey, A. Krogh et al., Dirichlet mixtures: a method for improved detection of weak but significant protein sequence homology Computer applications in the biosciences, CABIOS, vol.12, pp.327-345, 1996.

M. Brown, R. Hughey, A. Krogh, I. S. Mian, K. Sjölander et al., Using dirichlet mixture priors to derive hidden Markov models for protein families, Proceedings of the 1 st International Conference on Intelligent Systems for Molecular Biology, pp.47-55, 1993.

R. Hughey and A. Krogh, Hidden Markov models for sequence analysis: extension and analysis of the basic method, Bioinformatics, vol.12, issue.2, pp.95-107, 1996.
DOI : 10.1093/bioinformatics/12.2.95

E. L. Sonnhammer, S. R. Eddy, and R. Durbin, Pfam: A comprehensive database of protein domain families based on seed alignments, Proteins: Structure, Function, and Genetics, vol.183, issue.3, pp.405-420, 1997.
DOI : 10.1016/0076-6879(90)83031-4

R. D. Finn, A. Bateman, J. Clements, P. Coggill, R. Y. Eberhardt et al., Pfam: the protein families database, Nucleic Acids Res, 2013.
DOI : 10.1002/047001153x.g306303

URL : https://hal.archives-ouvertes.fr/hal-01294685

D. H. Haft, J. D. Selengut, R. A. Richter, D. Harkins, M. K. Basu et al., TIGRFAMs and Genome Properties in 2013, Nucleic Acids Research, vol.41, issue.D1, pp.387-395, 2013.
DOI : 10.1093/nar/gks1234

URL : http://doi.org/10.1093/nar/gks1234

J. Moult, A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction, Current Opinion in Structural Biology, vol.15, issue.3, pp.285-289, 2005.
DOI : 10.1016/j.sbi.2005.05.011

J. Gough, K. Karplus, R. Hughey, and C. Chothia, Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure, Journal of Molecular Biology, vol.313, issue.4, pp.313-903, 2001.
DOI : 10.1006/jmbi.2001.5080

S. F. Altschul, T. L. Madden, A. A. Schäffer, J. Zhang, Z. Zhang et al., Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic Acids Research, vol.25, issue.17, pp.3389-3402, 1997.
DOI : 10.1093/nar/25.17.3389

. Uniprot, Update on activities at the universal protein resource (UniProt) in 2013, Nucleic Acids Res, vol.41, pp.43-47, 2013.

K. D. Pruitt, T. Tatusova, and D. R. Maglott, NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins, Nucleic Acids Research, vol.33, issue.Database issue, pp.501-504, 2005.
DOI : 10.1093/nar/gki025

K. Karplus, Hidden Markov models for detecting remote protein homologies, Bioinformatics, vol.14, issue.10, pp.846-865, 1998.
DOI : 10.1093/bioinformatics/14.10.846

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.132.5575

K. Karplus, R. Karchin, C. Barrett, S. Tu, M. Cline et al., What is the value added by human intervention in protein structure prediction?, Proteins: Structure, Function, and Genetics, vol.16, issue.S5, pp.86-91, 2001.
DOI : 10.1093/bioinformatics/16.2.125

K. Karplus, R. Karchin, J. Draper, J. Casper, Y. Mandel-gutfreund et al., Combining local-structure, fold-recognition, and new fold methods for protein structure prediction, Proteins: Structure, Function, and Genetics, vol.321, issue.S6, pp.491-496, 2003.
DOI : 10.1002/prot.10540

S. R. Eddy, Accelerated Profile HMM Searches, PLoS Computational Biology, vol.21, issue.10, p.1002195, 2011.
DOI : 10.1371/journal.pcbi.1002195.g006

URL : http://doi.org/10.1371/journal.pcbi.1002195

J. Söding, Protein homology detection by HMM-HMM comparison, Bioinformatics, vol.21, issue.7, pp.951-960, 2005.
DOI : 10.1093/bioinformatics/bti125

M. Remmert, A. Biegert, A. Hauser, and J. Söding, HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment, Nature Methods, vol.11, issue.2, pp.173-175, 2012.
DOI : 10.1006/jmbi.1993.1626

URL : http://pubman.mpdl.mpg.de/pubman/item/escidoc:1944218/component/escidoc:1945401/1944218.pdf

T. J. Wheeler and S. R. Eddy, nhmmer: DNA homology search with profile HMMs, Bioinformatics, vol.29, issue.19, pp.2487-2489, 2013.
DOI : 10.1093/bioinformatics/btt403

URL : http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3777106

T. J. Wheeler, J. Clements, S. R. Eddy, R. Hubley, T. A. Jones et al., Dfam: a database of repetitive DNA based on profile hidden Markov models, Nucleic Acids Research, vol.41, issue.D1, pp.70-82, 2013.
DOI : 10.1093/nar/gks1265

S. R. Eddy, A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure, BMC Bioinformatics, vol.3, issue.1, p.18, 2002.
DOI : 10.1186/1471-2105-3-18

Y. Sakakibara, M. Brown, R. Hughey, I. S. Mian, K. Sjölander et al., Recent methods for RNA modeling using stochastic context-free grammars, Proceedings of the Asilomar Conference on Combinatorial Pattern Matching, pp.289-306, 1994.
DOI : 10.1007/3-540-58094-8_25

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.55.9031

S. R. Eddy and R. Durbin, RNA sequence analysis using covariance models, Nucleic Acids Research, vol.22, issue.11, pp.2079-2088, 1994.
DOI : 10.1093/nar/22.11.2079

URL : https://www.ncbi.nlm.nih.gov/pmc/articles/PMC308124/pdf

S. W. Burge, J. Daub, R. Eberhardt, J. Tate, L. Barquist et al., Rfam 11.0: 10 years of RNA families, Nucleic Acids Research, vol.41, issue.D1, pp.226-232, 2013.
DOI : 10.1093/nar/gks1005

E. P. Nawrocki and S. R. Eddy, Infernal 1.1: 100-fold faster RNA homology searches, Bioinformatics, vol.29, issue.22, pp.2933-2935, 2013.
DOI : 10.1093/bioinformatics/btt509

Y. Uemura, A. Hasegawa, S. Kobayashi, and T. Yokomori, Tree adjoining grammars for RNA structure prediction, Theoretical Computer Science, vol.210, issue.2, pp.277-303, 1999.
DOI : 10.1016/S0304-3975(98)00090-5

URL : http://doi.org/10.1016/s0304-3975(98)00090-5

E. Rivas and S. Eddy, The language of RNA: a formal grammar that includes pseudoknots, Bioinformatics, vol.16, issue.4, p.334, 2000.
DOI : 10.1093/bioinformatics/16.4.334

L. Cai, R. L. Malmberg, and Y. Wu, Stochastic modeling of RNA pseudoknotted structures: a grammatical approach, Bioinformatics, vol.19, issue.Suppl 1, pp.66-73, 2003.
DOI : 10.1093/bioinformatics/btg1007

H. Matsui, K. Sato, and Y. Sakakibara, Pair stochastic tree adjoining grammars for aligning and predicting pseudoknot RNA structures, Proc IEEE Comput Syst Bioinform Conf, pp.290-299, 2004.
DOI : 10.1109/csb.2004.1332442

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.112.9004

W. N. Grundy, T. L. Bailey, C. P. Elkan, and M. E. Baker, meta-MEME: Motif-based hidden Markov models of protein families, Bioinformatics, vol.13, issue.4, pp.397-406, 1997.
DOI : 10.1093/bioinformatics/13.4.397

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.319.9644

I. Jonassen, J. Collins, and D. Higgins, Finding flexible patterns in unaligned protein sequences, Protein Science, vol.22, issue.8, pp.1587-1595, 1995.
DOI : 10.1002/j.1538-7305.1948.tb01338.x

URL : http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2143188

N. Hulo, A. Bairoch, V. Bulliard, L. Cerutti, B. A. Cuche et al., The 20 years of PROSITE, Nucleic Acids Research, vol.36, issue.Database, pp.245-249, 2008.
DOI : 10.1093/nar/gkm977

T. Yokomori, N. Ishida, and S. Kobayashi, Learning local languages and its application to protein /spl alpha/-chain identification, Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences HICSS-94, pp.113-122, 1994.
DOI : 10.1109/HICSS.1994.323560

T. Yokomori and S. Kobayashi, Learning local languages and their application to DNA sequence analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.20, issue.10, pp.1067-1079, 1998.
DOI : 10.1109/34.722617

P. Garcia, E. Vidal, and J. Oncina, Learning locally testable languages in the strict sense, Proceedings of the International Conference on Algorithmic Learning Theory, pp.325-338, 1990.

P. Garcia and E. Vidal, Inference of k-testable languages in the strict sense and application to syntactic pattern recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.12, issue.9, pp.12-920, 1990.
DOI : 10.1109/34.57687

P. Peris, D. López, M. Campos, and J. M. Sempere, Protein Motif Prediction by Grammatical Inference, Lecture Notes in Computer Science, vol.4201, pp.175-187, 2006.
DOI : 10.1007/11872436_15

P. Peris, D. López, and M. Campos, IgTM: An algorithm to predict transmembrane domains and topology in proteins, BMC Bioinformatics, vol.9, issue.1, 2008.
DOI : 10.1186/1471-2105-9-367

URL : http://doi.org/10.1186/1471-2105-9-367

P. Garcia, E. Vidal, and F. Casacuberta, Local Languages, the Succesor Method, and a Step Towards a General Methodology for the Inference of Regular Grammars, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.9, issue.6, pp.841-845, 1987.
DOI : 10.1109/TPAMI.1987.4767991

J. Oncina and P. Garcia, Inferring regular languages in polynomial update time, In: Pattern Recognition and Image Analysis, pp.49-61, 1992.
DOI : 10.1142/9789812797902_0004

K. J. Lang, Random DFA's can be approximately learned from sparse uniform examples, Proceedings of the fifth annual workshop on Computational learning theory , COLT '92, pp.45-52, 1992.
DOI : 10.1145/130385.130390

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.1047

K. J. Lang, B. A. Pearlmutter, and R. A. Price, Results of the Abbadingo one DFA learning competition and a new evidence-driven state merging algorithm, Grammatical Inference, 4th International Colloquium, ICGI-98 Proceedings. Volume 1433 of Lecture Notes in Computer Science, pp.1-12, 1998.
DOI : 10.1007/BFb0054059

F. Coste, G. Kerbellec, B. Idmont, D. Fredouille, and C. Delamarche, Apprentissage d'automates par fusions de paires de fragments significativement similaires etpremì eres expérimentations sur les protéines MIP, In: JOBIM, 2004.

F. Coste and G. Kerbellec, A Similar Fragments Merging Approach to Learn Automata on Proteins, Lecture Notes in Computer Science, vol.3720, pp.522-529, 2005.
DOI : 10.1007/11564096_50

URL : https://hal.archives-ouvertes.fr/inria-00000179

F. Coste and G. Kerbellec, Learning Automata on Protein Sequences, pp.199-210, 2006.
URL : https://hal.archives-ouvertes.fr/inria-00180429

G. Kerbellec, Apprentissage d'automates modélisant des familles de séquences protéiques, Université de Rennes, vol.1, 2008.

A. Bretaudeau, F. Coste, F. Humily, L. Garczarek, G. L. Corguillé et al., CyanoLyase: a database of phycobilin lyase sequences, motifs and functions, Nucleic Acids Research, vol.41, issue.D1, pp.396-401, 2013.
DOI : 10.1093/nar/gks1091

URL : https://hal.archives-ouvertes.fr/hal-01094087

A. Burgos, F. Coste, and G. Kerbellec, Learning automata on protein sequences by partial multiple sequence alignment

F. Coste and D. Fredouille, What is the Search Space for the Inference of Non Deterministic, Unambiguous and Deterministic Automata ?, 2003.
URL : https://hal.archives-ouvertes.fr/inria-00071673

W. Dyrka and J. C. Nebel, A stochastic context free grammar based framework for analysis of protein sequences, BMC Bioinformatics, vol.10, issue.1, p.323, 2009.
DOI : 10.1186/1471-2105-10-323

F. Coste, G. Garet, and J. Nicolas, Local Substitutability for Sequence Generalization, ICGI 2012 Conference Proceedings, pp.97-111, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00730553

A. Clark and R. Eyraud, Identification in the Limit of Substitutable Context-Free Languages, Proceedings of the 16th International Conference on Algorithmic Learning Theory, pp.283-296, 2005.
DOI : 10.1007/11564089_23

URL : https://hal.archives-ouvertes.fr/hal-00186889

A. Clark and R. Eyraud, Polynomial identification in the limit of substitutable context-free languages, Journal of Machine Learning Research, vol.8, pp.1725-1745, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00186889

R. Yoshinaka, Identification in the Limit of k,l-Substitutable Context-Free Languages, Lecture Notes in Computer Science, vol.5278, pp.266-279, 2008.
DOI : 10.1007/978-3-540-88009-7_21

Z. Harris, Distributional structure, pp.146-162, 1954.

F. Coste, G. Garet, and J. Nicolas, A bottom-up efficient algorithm learning substitutable languages from positive examples, Conference Proceedings, pp.49-63, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01080249

C. G. Nevill-manning and I. H. Witten, Compression and Explanation using Hierarchical Grammars, The Computer Journal, vol.40, issue.2 and 3, pp.103-116, 1997.
DOI : 10.1093/comjnl/40.2_and_3.103

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.27.7784

N. Cherniavsky and R. Lander, Grammar-based compression of DNA sequences, DIMACS Working Group on the Burrows-Wheeler Transform, 2004.

J. K. Lanctot, M. Li, and E. H. Yang, Estimating DNA sequence entropy, ACM-SIAM Symposium on Discrete Algorithms, pp.409-418, 2000.

A. Apostolico and S. Lonardi, Off-line compression by greedy textual substitution, Proceedings of the IEEE, vol.88, issue.11, pp.1733-1744, 2000.
DOI : 10.1109/5.892709

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.79.9094

A. Apostolico and S. Lonardi, Compression of biological sequences by greedy off-line textual substitution, Proceedings DCC 2000. Data Compression Conference, pp.143-153, 2000.
DOI : 10.1109/DCC.2000.838154

C. Nevill-manning and I. Witten, On-line and off-line heuristics for inferring hierarchies of repetitions in sequences, Data Compression Conference, pp.1745-1755, 2000.
DOI : 10.1109/5.892710

R. Carrascosa, F. Coste, M. Gallé, and G. G. López, The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing, Algorithms, vol.34, issue.4, pp.262-284, 2011.
DOI : 10.1002/spe.619

URL : https://hal.archives-ouvertes.fr/inria-00638445

R. Carrascosa, F. Coste, M. Gallé, and G. G. López, Searching for smallest grammars on large sequences and application to DNA, Journal of Discrete Algorithms, vol.11, pp.62-72, 2012.
DOI : 10.1016/j.jda.2011.04.006

URL : https://hal.archives-ouvertes.fr/inria-00536633

B. Brejova, T. Vinar, and M. Li, Pattern Discovery: Methods and Software, pp.491-522, 2003.
DOI : 10.1385/1-59259-335-6:491

Y. Sakakibara, Grammatical inference in bioinformatics, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.27, issue.7, pp.1051-1062, 2005.
DOI : 10.1109/TPAMI.2005.140

R. Durbin, S. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis : Probabilistic Models of Proteins and Nucleic Acids, 1999.
DOI : 10.1017/CBO9780511790492

P. Baldi and S. Brunak, Bioinformatics: The Machine Learning Approach, 2001.

C. De-la-higuera, Grammatical Inference: Learning Automata and Grammars, 2010.
DOI : 10.1017/CBO9781139194655

URL : https://hal.archives-ouvertes.fr/hal-00476128