A. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, Basic local alignment search tool, Journal of Molecular Biology, vol.215, issue.3, p.403410, 1990.
DOI : 10.1016/S0022-2836(05)80360-2

K. Adrian, Y. Arakaki, J. Huang, and . Skolnick, EFICAz2 : enzyme function inference by a combined approach enhanced by machine learning, BMC Bioinformatics, vol.10, issue.11, 2009.

J. Allouch, M. Jam, W. Helbert, T. Barbeyron, B. Kloareg et al., The threedimensional structures of two ? -agarases, Journal of Biological Chemistry, issue.47, p.2784717147180, 2003.

N. Abe and H. Mamitsuka, Predicting Protein Secondary Structure Using Stochastic Tree Grammars, Machine Learning, vol.29, issue.2-3, p.275301, 1997.

[. Abdeddaïm and B. Morgenstern, Speeding Up the DIALIGN Multiple Alignment Program by Using thèGreedy Alignment of BIOlogical Sequences LIBrary' (GABIOS-LIB), Computational Biology, p.111, 2001.

D. Angluin, Inductive inference of formal languages from positive data, Information and Control, vol.45, issue.2, p.117135, 1980.
DOI : 10.1016/S0019-9958(80)90285-5

]. D. Ang82 and . Angluin, Inference of Reversible Languages, J. ACM, vol.29, issue.3, p.741765, 1982.

D. Angluin, Learning regular sets from queries and counterexamples. Information and computation, p.87106, 1987.

B. Ulavappa, M. Angadi, and . Venkatesulu, Structural SCOP Superfamily Level Classication Using Unsupervised Machine Learning, IEEE/ACM Trans. Comput. Biology Bioinform, vol.9, issue.2, p.601608, 2012.

[. Burroughs, K. N. Allen, D. Dunaway-mariano, and L. Aravind, Evolutionary Genomics of the HAD Superfamily: Understanding the Structural Adaptations and Catalytic Diversity in a Superfamily of Phosphoesterases and Allied Enzymes, Journal of Molecular Biology, vol.361, issue.5, p.36110031034, 2006.
DOI : 10.1016/j.jmb.2006.06.049

A. Bairoch, The ENZYME database in 2000, Nucleic Acids Research, vol.28, issue.1, p.304305, 2000.
DOI : 10.1093/nar/28.1.304

P. Baldi and S. Brunak, Bioinformatics : The Machine Learning Approach, 2001.

. L. Bbb-+-09-]-t, M. Bailey, F. A. Boden, M. Buske, C. E. Frith et al., MEME Suite : tools for motif discovery and searching, Nucleic Acids Research, vol.37, issue.2, pp.202-208, 2009.

A. Bateman, L. Coin, R. Durbin, D. Robert, V. Finn et al., The pfam protein families database, Nucleic acids research, issue.1, pp.32-138, 2004.
URL : https://hal.archives-ouvertes.fr/hal-01294685

[. Brewka, T. Eiter, and M. Truszczynski, Answer set programming at a glance, Communications of the ACM, vol.54, issue.12, p.92103, 2011.
DOI : 10.1145/2043174.2043195

[. Brameier, J. Haan, A. Krings, and R. Maccallum, Automatic discovery of cross-family sequence features associated with protein function, BMC Bioinformatics, vol.7, issue.1, p.16, 2006.
DOI : 10.1186/1471-2105-7-16

A. Brazma, I. Jonassen, I. Eidhammer, and D. Gilbert, Approaches to the Automatic Discovery of Patterns in Biosequences, Journal of Computational Biology, vol.5, issue.2, p.279305, 1998.
DOI : 10.1089/cmb.1998.5.279

[. Busygin, O. Prokopyev, and P. M. Pardalos, Biclustering in data mining, Computers & Operations Research, vol.35, issue.9, p.29642987, 2008.
DOI : 10.1016/j.cor.2007.01.005

[. Bystro, V. Thorsson, and D. Baker, HMMSTR : a hidden Markov model for local sequence-structure correlations in proteins, Journal of molecular biology, vol.301, issue.1, p.173190, 2000.

R. Carrascosa, F. Coste, M. Gallé, and G. G. López, The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing, Algorithms, vol.4, issue.4, p.262284, 2011.
DOI : 10.3390/a4040262

URL : https://hal.archives-ouvertes.fr/inria-00638445

L. Brandi, . Cantarel, M. Pedro, C. Coutinho, T. Rancurel et al., The Carbohydrate-Active En- Zymes database (CAZy) : an expert resource for glycogenomics, Nucleic acids research, vol.37, issue.1, pp.233-238, 2009.

R. [. Clark and . Eyraud, Polynomial Identication in the Limit of Substitutable Context-free Languages, Journal of Machine Learning Research, vol.8, p.17251745, 2007.

F. Coste, G. Garet, A. Groisillier, J. Nicolas, T. Coste et al., Automated enzyme classication by formal concept analysis Local Substitutability for Sequence Generalization, Formal Concept Analysis ICGI 2012 Conference Proceedings, pp.235250-97111, 1957.

N. Chomsky, A. K. Dtic-document-david-chiang, D. B. Joshi, M. Searls-arthur, and . Lesk, Aspects of the theory of syntax Grammatical Representations of Macromolecular Structure The relation between the divergence of sequence and structure in proteins The relation between the divergence of sequence and structure in proteins, Journal of Computational Biology The EMBO journal The EMBO journal, vol.13, issue.54, p.10771100823826823, 1964.

A. Clark, Unsupervised induction of stochastic context-free grammars using distributional clustering, Proceedings of the 2001 workshop on Computational Natural Language Learning , ConLL '01, p.13, 2001.
DOI : 10.3115/1117822.1117831

A. Clarkcla10a and ]. A. Clark, Combining distributional and morphological information for part of speech induction Distributional Learning of some Context-free Languages with a Minimally Adequate Teacher, Proceedings of the tenth conference on European chapter Grammatical Inference : Theoretical Results and Applications. Proceedings of the International Colloquium on Grammatical Inference, number 6339 in Lecture Notes in Computer Science, pp.5966-2437, 2003.

A. Clark, Learning Context Free Grammars with the Syntactic Concept Lattice, Grammatical Inference : Theoretical Results and Applications. Proceedings of the International Colloquium on Grammatical Inference, number 6339 in Lecture Notes in Computer Science, p.3851, 2010.
DOI : 10.1007/978-3-642-15488-1_5

A. Clark, A Language Theoretic Approach to Syntactic Structure, The Mathematics of Language, p.3956, 2011.
DOI : 10.1017/CBO9780511791222

A. Clark, Learning Trees from Strings : A Strong Learning Algorithm for some Context-Free Grammars, Journal of Machine Learning Research, vol.14, p.35373559, 2014.

M. Craig, A. Cook, A. R. Rosenfeld, and . Aronson, Grammatical inference by hill climbing, Information Sciences, vol.10, issue.2, p.5980, 1976.

C. Claudel-renard, C. Chevalet, T. Faraut, D. Duret, and S. Abdeddaim, Simplicity : A unifying principle in cognitive science ? Trends in cognitive sciences Multiple alignment for structural, functional, or phylogenetic analyses of homologous sequences [Day73] Margaret Oakley Dayho Atlas of Protein Sequence and Structure : Supplement No. 1 ; Edited [by] MO Dayho Prediction of Enzyme Classication from Protein Sequence without the Use of Sequence Similarity Characteristic Sets for Polynomial Grammatical Inference Some classes of regular languages identiable in the limit from positive data A stochastic context free grammar based framework for analysis of protein sequences, Enzyme-specic proles for genome annotation : PRIAM. Nucleic acids research Bioinformatics : Sequence, Structure, and Databanks ISMB, pages 9299. AAAI, 1997. [dlH97] Colin de la Higuera Machine Learning Grammatical Inference : algorithms and applicationsDMV94] P. Dupont, L. Miclet, and E. Vidal. What is the search space of the regular inference ? In In Proceedings of the Second International Colloquium on Grammatical Inference (ICGI'94, pp.3166336639-5176125138, 1922.

D. Devos and A. Valencia, Practical limits of function prediction Hmmer user's guide : biological sequence analysis using prole hidden markov models, EPS + 05 Improving Protein Function Prediction Using the Hierarchical Structure of the Gene Ontology. In CIBCB, pp.98107-354363, 1998.

R. Eyraud, Inférence grammaticale de langages hors-contextes, 2006.

N. K. Fox, S. E. Brenner, and J. Chandonia, SCOPe: Structural Classification of Proteins???extended, integrating SCOP and ASTRAL data and classification of new structures, Nucleic Acids Research, vol.42, issue.D1, pp.42-304, 2014.
DOI : 10.1093/nar/gkt1240

D. Fredouille, Inférence d'automates nis non déterministes par gestion de l'ambiguïté, en vue d'applications en bioinformatique, 2003.

C. Justin, C. Fay, and . Wu, Sequence divergence, functional constraint, and selection in protein evolution. Annual review of genomics and human genetics, p.213235, 2003.

A. John, . Gerlt, C. Patricia, and . Babbitt, Can sequence determine function

F. Diana, M. Gordon, and . Desjardins, Evaluation and selection of biases in machine learning, Machine Learning, p.522, 1995.

E. [. Galperin and . Koonin, From complete genome sequence tòtòcomplete' understanding ?, Trends in Biotechnology, vol.28, issue.8, p.398406, 2010.

[. Gebser, B. Kaufmann, and T. Schaub, Conict-driven answer set solving : From theory to practice, Artif. Intell, vol.187, p.5289, 2012.

[. Gribskov, D. Andrew, D. Mclachlan, E. Gaume, H. Navarro et al., Prole analysis : detection of distantly related proteins Clustering bipartite graphs in terms of approximate formal concepts and sub-contexts, Proceedings of the National Academy of Sciences, p.4355435811251142, 1987.

]. Gol67 and . Gold, Language identication in the limit, Information and control, vol.10, issue.5, p.447474, 1967.

]. E. Gol78 and . Gold, Complexity of Automaton Identication from Given Data, Information and Control, vol.37, issue.3, p.302320, 1978.

P. Grünwald, A minimum description length approach to grammar inference, Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language, p.203216, 1994.
DOI : 10.1007/3-540-60925-3_48

U. Göbel, C. Sander, R. Schneider, and A. Valencia, Correlated mutations and residue contacts in proteins, Proteins: Structure, Function, and Genetics, vol.262, issue.4, p.309317, 1994.
DOI : 10.1002/prot.340180402

X. Gu, Functional divergence in protein (family) sequence evolution, Genetica, vol.118, issue.2-3, p.133, 2003.
DOI : 10.1007/978-94-010-0229-5_4

E. [. Garcia, . Vidal-[-gvc87-]-p, E. Garcia, F. Vidal, and . Casacuberta, Inference of k-Testable Languages in the Strict Sense and Application to Syntactic Pattern Recognition Local Languages, the Successor Method, and a Step Towards a General Methodology for the Inference of Regular Grammars. Pattern Analysis and Machine Intelligence, ) :841845, nov. 1987. [GVO90] P. Garcia, E. Vidal, and J. Oncina. Learning Locally Testable Languages in the Strict Sense First int. workshop on Algorithmic Learning theory, ALT'90, pp.920925-325338, 1990.

G. L. Holliday, D. E. Almonacid, G. J. Bartlett, N. M. Boyle, J. W. Torrance et al., MACiE (Mechanism, Annotation and Classification in Enzymes): novel tools for searching catalytic mechanisms, Nucleic Acids Research, vol.35, issue.Database, pp.515-520, 2007.
DOI : 10.1093/nar/gkl774

N. Hulo, A. Bairoch, V. Bulliard, L. Cerutti, E. D. Castro et al., The PROSITE database, Nucleic Acids Research, vol.34, issue.90001, p.227230, 2006.
DOI : 10.1093/nar/gkj063

-. Hehemann, G. Correc, T. Barbeyron, W. Helbert, M. Czjzek et al., Transfer of carbohydrate-active enzymes from marine bacteria to Japanese gut microbiota, Nature, vol.5, issue.7290, pp.464908-912, 2010.
DOI : 10.1038/nature08937

]. T. Hea87, G. Head-jorja, and . Heniko, Formal language theory and DNA : An analysis of the generative capacity of specic recombinant behaviours Amino acid substitution matrices from protein blocks, Proceedings of the National Academy of Sciences, pp.737759-891091510919, 1987.

J. [. Heniko, W. J. Heniko, S. Alford, and . Pietrokovski, Automated construction and graphical presentation of protein blocks from unaligned sequences, Gene, vol.163, 1995.

]. S. Hjm-+-12, P. Hunter, A. Jones, R. Mitchell, T. K. Apweiler et al., InterPro in 2011 : new developments in the family and domain prediction database, Nucleic Acids Research, issue.D1, pp.40-306, 2012.

A. [. Ikeda and . Yamamoto, Classication by Selecting Plausible Formal Concepts in a Concept Lattice, Workshop on Formal Concept Analysis meets Information Retrieval (FCAIR2013), p.2235, 2013.

D. B. Janssen, Biocatalysis by Dehalogenating Enzymes, Advances in Applied Microbiology, vol.61, p.233252, 2007.
DOI : 10.1016/S0065-2164(06)61006-X

I. Jonassen, C. Helgesen, and D. Higgins, Scoring Function for Pattern Discovery Programs Taking Into Account Sequence Diversity, 1996.

]. G. Ker08 and . Kerbellec, Apprentissage d'automates modélisant des familles de séquences protéiques, 2008.

N. Lisa, . Kinch, V. Nick, and . Grishin, Evolution of protein structures and functions, Current opinion in structural biology, vol.12, issue.3, p.400408, 2002.

[. Jr, Application of a theory of enzyme specicity to protein synthesis, Proceedings of the National Academy of Sciences of the United States of America, vol.44, issue.2, p.98, 1958.

[. Koshiba, E. Mäkinen, and Y. Takada, Learning deterministic even linear languages from positive examples, Theoretical Computer Science, vol.185, issue.1, p.6379, 1997.

]. L. Kov07 and . Kovacs, Generating decision tree from lattice for classication, 7th International Conference on Applied Informatics, pp.377-384, 2007.

E. Kuznetsova, M. Proudfoot, C. F. Gonzalez, G. Brown, M. V. Omelchenko et al., Genome-wide Analysis of Substrate Specicities of the Escherichia coli Haloacid Dehalogenase-like Phosphatase Family, Journal of Biological Chemistry, issue.47, pp.28136149-36161, 2006.

V. Eugene, R. L. Koonin, and . Tatusov, Computer Analysis of Bacterial Haloacid Dehalogenases Denes a Large Superfamily of Hydrolases with Diverse Specicity : Application of an Iterative Approach to Database Search, Journal of Molecular Biology, vol.244, issue.1, p.125132, 1994.

F. Thomas and . Lee, The human genome project. Cracking the genetic code of life, 1991.

K. Lang, Random DFA's can be approximately learned from sparse uniform examples, Proceedings of the fifth annual workshop on Computational learning theory , COLT '92, p.4552, 1992.
DOI : 10.1145/130385.130390

A. Leroux, Inférence grammaticale sur des alphabets ordonnés : Application à la découverte de motifs dans des familles de protéines, 2005.

S. Jun, . Liu, F. Andrew, . Neuwald, E. Charles et al., Bayesian models for multiple local sequence alignment and gibbs sampling strategies, Journal of the American Statistical Association, issue.432, p.9011561170, 1995.

H. Mewes, K. Heumann, A. Kaps, F. Mayer, . Pfeier et al., Mips : a database for genomes and protein sequences, Nucleic acids research, vol.99, issue.271, p.4448, 1999.

M. Tom and . Mitchell, The need for biases in learning generalizations, 1980.

[. Morgenstern, DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment, Bioinformatics, vol.15, issue.3, p.211218, 1999.
DOI : 10.1093/bioinformatics/15.3.211

W. David and . Mount, Comparison of the pam and blosum amino acid substitution matrices, Cold Spring Harbor Protocols, issue.6, p.59, 2008.

]. J. Msr-+-10, L. Mark-cock, P. Sterck, D. Rouzé, A. Scornet et al., The Ectocarpus genome and the independent evolution of multicellularity in brown algae, Nature, issue.7298, p.617621, 2010.

H. Navarro, B. Prade, and . Gaume, Clustering Sets of Objects Using Concepts-Objects Bipartite Graphs, Lecture Notes in Computer Science, vol.7520, p.420432, 2012.
DOI : 10.1007/978-3-642-33362-0_32

URL : https://hal.archives-ouvertes.fr/hal-00992046

B. Saul, . Needleman, D. Christian, and . Wunsch, A general method applicable to the search for similarities in the amino acid sequence of two proteins, Journal of molecular biology, vol.48, issue.3, p.443453, 1970.

P. [. Oncina and . Garcia, Inferring regular languages in polynomial update time, Pattern Recognition and Image Analysis, p.4961, 1992.

W. Ogden, A helpful result for proving inherent ambiguity, Theory of Computing Systems, p.191194, 1968.
DOI : 10.1007/BF01694004

. A. Omj-+-97-]-c, A. D. Orengo, D. T. Michie, M. B. Jones, J. M. Swindells et al., CATH : A Hierarchic Classication of Protein Domain Structures Using secondary structures to measure the geometry of a protein, Structure, issue.5, p.10931108, 1997.

]. L. Pit89 and . Pitt, Inductive Inference, DFAs, and Computational Complexity, Proceedings of International Workshop on Analogical and Inductive Inference (AII), p.1844, 1989.

[. Peris, D. López, and M. Campos, IgTM: An algorithm to predict transmembrane domains and topology in proteins, BMC Bioinformatics, vol.9, issue.1, 2008.
DOI : 10.1186/1471-2105-9-367

D. [. Peris, M. López, J. M. Campos, and . Sempere, Protein Motif Prediction by Grammatical Inference, Sakakibara et al. [SKS + 06, p.175187
DOI : 10.1007/11872436_15

C. Pollard, Generalized phrase structure grammars, head grammars and natural language, 1984.

A. Payen and J. Persoz, Mémoire sur la diastase, les principaux produits de ses réactions

. Predrag-radivojac, T. Wyatt, . Clark, A. M. Tal-ronnen-oron, T. Schnoes et al., A large-scale evaluation of computational protein function prediction, 2013.

]. J. [-ris78, A. Rissanen-andreas-ruepp, D. Zollner, K. Maier, J. Albermann et al., Modeling by the shortest data description Automation al. The funcat, a functional annotation scheme for systematic classication of proteins from whole genomes, Nucleic acids research, vol.14, issue.18, pp.465471-3255395545, 1978.

M. Sahami, Learning Classication Rules Using Lattices, Lecture Notes in Computer Science, vol.912, p.343346, 1995.

I. Salvador, J. Billiau, T. Abeel, P. Rouze, and Y. Van-de-peer, RNA Modeling by Combining Stochastic Context-Free Grammars and n-Gram Models. IJPRAI ORCAE : online resource for community annotation of eukaryotes Stochastic Context-Free Grammars for Modeling RN, SBA + 12] Lieven Sterck, HICSS (5), pp.309-3151041, 1994.

F. Sanger, A. Hong-bin, K. Shen, and . Chou, A rapid method for determining sequences in dna by primed synthesis with dna polymerase EzyPred : a top-down approach for predicting enzyme functional classes and subclasses, Journal of molecular biology Biochem. Biophys. Res. Commun, vol.94, issue.3641, p.441448539, 1975.

A. L. Scd-+-13-]-ian-sillitoe, . Cu, H. Benoit, N. L. Dessailly, N. Dawson et al., New functional families (FunFams) in CATH to improve the mapping of conserved functional sites to 3D structures, Proceedings of the seventh conference on European chapter of the Association for Computational LinguisticsSCP + 13] Ida, pp.490-498, 1995.
DOI : 10.1093/nar/gks1211

D. B. Searls, The language of genes, Nature, vol.10, issue.6912, p.211217, 2002.
DOI : 10.1038/29667

M. Sarah, T. Sullivan, and . Holyoak, Enzymes with lid-gated active sites must operate by an induced t mechanism instead of conformational selection, Proceedings of the National Academy of Sciences, vol.105, issue.37, pp.13829-13834, 2008.

D. [. Solan, E. Horn, S. Ruppin, and . Edelman, Unsupervised learning of natural languages, Proceedings of the National Academy of Sciences, vol.102, issue.33, p.1162911634, 2005.
DOI : 10.1073/pnas.0409746102

D. Thomas, R. Schneider, and . Stephens, Sequence logos : a new way to display consensus sequences, Nucleic acids research, vol.18, issue.20, p.60976100, 1990.

[. Seifried, J. Schultz, and A. Gohla, Human HAD phosphatases: structure, mechanism, and roles in health and disease, FEBS Journal, vol.14, issue.Pt 2, p.549571, 2013.
DOI : 10.1111/j.1742-4658.2012.08633.x

M. [. Smith and . Waterman, Identication of common molecular subsequences, Journal of molecular biology, p.195197, 1981.

U. Syed and G. Yona, Enzyme Function Prediction with Interpretable Models, 2009.
DOI : 10.1007/978-1-59745-243-4_17

F. Keith, S. Tipton, and . Boyce, History of the enzyme nomenclature system, Bioinformatics, vol.16, issue.1, p.3440, 2000.

D. [. Thompson, T. J. Higgins, and . Gibson, CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice, Nucleic Acids Research, vol.22, issue.22, p.46734680, 1994.
DOI : 10.1093/nar/22.22.4673

[. Vapnik, Principles of risk minimization for learning theory In Advances in neural information processing systems ABL : Alignment-Based Learning, COLING 18, pp.831838-961967, 1992.

D. James, . Watson, H. Francis, and . Crick, Molecular structure of nucleic acids, Nature, vol.171, issue.4356, p.737738, 1953.

]. E. Web64 and . Webb, The nomenclature of multiple enzyme forms, Experientia, vol.20, issue.10, p.592, 1964.

C. Edwin and . Webb, Enzyme nomenclature : a personal retrospective, The FASEB Journal, vol.7, issue.12, p.11921194, 1993.

R. Wille, Restructuring lattice theory : An approach based on hierarchies of concepts, Ordered Sets, p.445470, 1982.

A. Cyrus, J. Wilson, M. Kreychman, and . Gerstein, Assessing annotation transfer for genomics : quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores, Journal of molecular biology, vol.297, issue.1, p.233249, 2000.

J. Wang, J. Liang, and Y. Qian, Closed-Label Concept Lattice Based Rule Extraction Approach, Lecture Notes in Computer Science, vol.1, issue.6, p.690698, 2011.
DOI : 10.1007/978-3-642-59830-2

[. Wolfp, Language acquisition and the discovery of phrase structure, Language and Speech, vol.23, issue.3, p.255269, 1980.

C. Todd, W. R. Wood, and . Pearson, Evolution of Protein Sequences and Structures, Journal of Molecular Biology, vol.291, issue.4, p.977995, 1999.

[. Wang, Z. Yang, and N. Deng, SVM-based Method for Predicting Enzyme Function in a Hierarchical Context, The Fourth International Conference on Computational Systems Biology (ISB2010), p.119127, 2010.

N. [. Yokomori, S. Ishida, and . Kobayashi, Learning Local Languages and its Application to Protein ?-Chain Identication, HICSS (5), pp.113-122, 1994.

]. R. Yos08 and . Yoshinaka, Identication in the Limit of (k,l)-Substitutable Context-Free Languages, Proceedings of the 9th international colloquium conference on Grammatical inference : theoretical results and applications, ICGI'08, p.266279, 2008.

R. Yoshinaka, Ecient learning of multiple context-free languages with multidimensional substitutability from positive data, Theoretical Computer Science, vol.412, issue.19, p.18211831, 2011.

J. Zhang, R. Chiodini, A. Badr, and G. Zhang, The impact of nextgeneration sequencing on genomics, Journal of Genetics and Genomics, vol.38, issue.3, p.95109, 2011.

A. Sur-l-'alignement-de-la-gure, 1, les chemins représentés par une suite d'arc représente une séquence et les blocs sont représentés par des rectangles comprenant certaines positions des séquences. La couleur des séquences représente leur appartenance à une classe donnée dénie comme suit : Agarase (EC