N. Abe and H. Mamitsuka, Predicting protein secondary structure using stochastic tree grammars, Machine Learning, vol.29, pp.275-301, 1997.

H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. T. Bhat et al., The Protein Data Bank, Nucleic Acid Research, vol.28, pp.235-242, 2000.

K. M-bohren, . Bullock, K. Wermuth, and . Gabbay, The aldo-keto reductase superfamily. cDNAs and deduced amino acid sequences of human aldehyde and aldose reductases, Journal of Biological Chemistry, vol.264, issue.16, pp.9547-51, 1989.

A. Bretaudeau, F. Coste, F. Humily, L. Garczarek, G. L. Corguillé et al., CyanoLyase: a database of phycobilin lyase sequences, motifs and functions, Nucleic Acids Research, vol.41, pp.396-401, 2013.
URL : https://hal.archives-ouvertes.fr/hal-01094087

F. Coste and G. Kerbellec, Learning Automata on Protein Sequences, JOBIM, pp.199-210, 2006.
URL : https://hal.archives-ouvertes.fr/inria-00180429

J. Czekanowski, Zur differential Diagnose der Neandertalgruppe. Korrespondenzblatt der deutschen Gesellschaft für Anthropologie, vol.40, pp.44-47, 1909.

A. Daskalov, W. Dyrka, and S. J. Saupe, Theme and variations: evolutionary diversification of the HET-s functional amyloid motif, Scientific Reports, vol.5, p.12494, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01202835

T. M. Oliveira, P. Delatorre, B. A. Da-rocha, E. P. Souza, K. S. Nascimento et al., Crystal structure of Dioclea rostrata lectin: Insights into understanding the pH-dependent dimer-tetramer equilibrium and the structural basis for carbohydrate recognition in Diocleinae lectins, Journal of Structural Biology, vol.164, issue.2, pp.177-182, 2008.

. Lee-raymond-dice, Measures of the amount of ecologic association between species, Ecology, vol.26, issue.3, pp.297-302, 1945.

R. D. Dowell and S. R. Eddy, Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction, BMC Bioinformatics, vol.5, issue.1, p.71, 2004.

W. Dyrka, Probabilistic context-free grammar for pattern detection in protein sequences, Information Systems and Mathematics, 2007.

W. Dyrka and J. Nebel, A stochastic context free grammar based framework for analysis of protein sequences, BMC Bioinformatics, vol.10, p.323, 2009.

W. Dyrka, J. Nebel, and M. Kotulska, Probabilistic grammatical model for helix-helix contact site classification, Algorithms for Molecular Biology, vol.8, issue.1, p.31, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00925929

W. Dyrka, F. Coste, and J. Talibart, Estimating probabilistic context-free grammars for proteins using contact map constraints. under review, preprint at arxiv.org, 2018.

S. Eddy, Profile hidden Markov models, Bioinformatics, vol.14, issue.9, pp.755-763, 1998.

S. R. Eddy, Accelerated profile HMM searches, PLoS Computational Biology, vol.7, issue.10, p.1002195, 2011.

S. R. Eddy and R. Durbin, RNA sequence analysis using covariance models, Nucleic Acids Research, vol.22, issue.11, pp.2079-2088, 1994.

T. Fawcett, An introduction to ROC analysis, Pattern Recognition Letters, vol.27, issue.8, pp.167-8655, 2006.

R. D. Finn, P. Coggill, R. Y. Eberhardt, S. R. Eddy, J. Mistry et al., The pfam protein families database: towards a more sustainable future, Nucleic Acids Research, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01294685

E. R. Gansner and S. C. North, An open graph visualization system and its applications to software engineering. Software Practice and Experience, vol.30, pp.1203-1236, 2000.

J. D. Hunter, Matplotlib: A 2d graphics environment, Computing in Science & Engineering, vol.9, issue.3, pp.90-95, 2007.

B. Knudsen and J. Hein, RNA secondary structure prediction using stochastic context-free grammars and evolutionary history, Bioinformatics, vol.15, pp.446-54, 1999.

B. Knudsen and J. Hein, Pfold: RNA secondary structure prediction using stochastic context-free grammars, Nucleic Acids Research, vol.31, issue.13, pp.3423-3428, 2003.

M. Knudsen, Stochastic context-free grammars and RNA secondary structure prediction, 2005.

H. Mamitsuka and . Abe, Predicting location and structure of betasheet regions using stochastic tree grammars, Second International Conference on Intelligent Systems for Molecular Biology, pp.276-284, 1994.

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion et al., Scikit-learn: Machine learning in Python, Journal of Machine Learning Research, vol.12, pp.2825-2830, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00650905

M. Remmert, A. Biegert, A. Hauser, and J. Soeding, HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment, Nature Methods, vol.9, issue.2, pp.173-175, 2012.

Y. Sakakibara, R. Brown, I. Underwood, and . Mian, Stochastic context-free grammars for modeling RNA, 27th Hawaii Int Conf System Sciences, pp.349-58, 1993.

Y. Sakakibara, Efficient learning of context-free grammars from positive structural examples. Information and Computation, vol.97, pp.23-60, 1992.

E. Sciacca, S. Spinella, D. Ienco, and P. Giannini, Annotated stochastic context free grammars for analysis and synthesis of proteins, Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics, vol.6623, pp.978-981, 2011.

N. Sharon and H. Lis, Legume lectins-a large family of homologous proteins, The FASEB Journal, vol.4, issue.14, p.2227211, 1990.

J. A. Christian, E. Sigrist, L. De-castro, . Cerutti, A. Béatrice et al., Alan Bridge, Lydie Bougueleret, and Ioannis Xenarios. New and continuing developments at PROSITE, Nucleic Acids Research, vol.41, issue.D1, pp.344-347, 2013.

J. Soeding, Protein homology detection by HMM-HMM comparison, Bioinformatics, vol.21, issue.7, pp.951-960, 2005.

T. Sørensen, A method of establishing groups of equal amplitude in plant sociology based on similarity of species and its application to analyses of the vegetation on Danish commons, Kongelige Danske Videnskabernes Selskab, vol.5, issue.4, pp.1-34, 1948.

Z. Sükösd, B. Knudsen, J. Kjems, and C. N. Pedersen, PPfold 3.0: fast RNA secondary structure prediction using phylogeny and auxiliary data, Bioinformatics, vol.28, issue.20, pp.2691-2692, 2012.

C. Héì-ene-van-melckebeke, A. Wasmer, . Lange, A. B. Eiso, A. Loquet et al., Atomic-resolution three-dimensional structure of HETs(218-289) amyloid fibrils by solid-state NMR spectroscopy, Journal of the American Chemical Society, vol.132, issue.39, p.20828131, 2010.

G. Van-rossum and J. De-boer, Interactively testing remote servers using the Python programming language, CWI Quarterly, vol.4, pp.283-303, 1991.

J. Waldispuehl and J. Steyaert, Modeling and predicting all-transmembrane proteins including helix-helix pairing, Theoretical Computer Science, vol.335, pp.67-92, 2005.

J. Waldispuehl, B. Berger, P. Clote, and J. Steyaert, Predicting transmembrane betabarrels and interstrand residue interactions from sequence, Proteins: Structure, Function and Genetics, vol.65, issue.1, pp.61-74, 2006.

J. Waldispuehl, C. W. O'donnell, S. Devadas, P. Clote, and B. Berger, Modeling ensembles of transmembrane beta-barrel proteins, Proteins: Structure, Function and Genetics, vol.71, issue.3, pp.1097-1112, 2008.

S. Wang, S. Sun, Z. Li, R. Zhang, and J. Xu, Accurate de novo prediction of protein contact map by ultra-deep learning model, PLOS Computational Biology, vol.13, issue.1, pp.1-34, 2017.

M. Weigt, . White, J. A. Szurmant, T. Hoch, and . Hwa, Identification of direct residue contacts in protein-protein interaction by message passing, Proceedings of the National Academy of Sciences, vol.106, pp.67-72, 2009.

P. Pawel, M. Wozniak, and . Kotulska, Characteristics of protein residue-residue contacts and their application in contact prediction, Journal of Molecular Modeling, vol.20, issue.11, p.2497, 2014.