C. Ponting and N. Dickens, Genome cartography through domain annotation, Genome Biology, vol.2, pp.2006-2006, 2001.

R. Finn, J. Mistry, J. Tate, P. Coggill, A. Heger et al., The Pfam protein families database, Nucleic Acids Research, vol.38, issue.Database, pp.211-222, 2010.
DOI : 10.1093/nar/gkp985

URL : https://hal.archives-ouvertes.fr/hal-01294685

J. Darnell and W. Doolittle, Speculations on the early course of evolution., Proceedings of the National Academy of Sciences, vol.83, issue.5, pp.1271-1275, 1986.
DOI : 10.1073/pnas.83.5.1271

C. Sander and R. Schneider, Database of homology-derived protein structures and the structural meaning of sequence alignment, Proteins: Structure, Function, and Genetics, vol.4, issue.1, pp.56-68, 1991.
DOI : 10.1002/prot.340090107

D. Bashford, C. Chothia, and A. Lesk, Determinants of a protein fold, Journal of Molecular Biology, vol.196, issue.1, pp.199-216, 1987.
DOI : 10.1016/0022-2836(87)90521-3

A. Lesk, M. Levitt, and C. Chothia, Alignment of the amino acid sequences of distantly related proteins using variable gap penalties, "Protein Engineering, Design and Selection", vol.1, issue.1, pp.77-78, 1986.
DOI : 10.1093/protein/1.1.77

I. Callebaut, K. Prat, E. Meurice, J. Mornon, and S. Tomavo, Prediction of the general transcription factors associated with RNA polymerase II in Plasmodium falciparum: conserved features and differences relative to other eukaryotes, BMC Genomics, vol.6, p.16042788, 2005.
URL : https://hal.archives-ouvertes.fr/hal-00021609

J. Baussand, C. Deremble, and A. Carbone, Periodic distributions of hydrophobic amino acids allows the definition of fundamental building blocks to align distantly related proteins, Proteins: Structure, Function, and Bioinformatics, vol.10, issue.Database issue, pp.695-708, 2007.
DOI : 10.1002/prot.21319

R. Hughey and A. Krogh, Hidden Markov models for sequence analysis: extension and analysis of the basic method, Bioinformatics, vol.12, issue.2, pp.95-107, 1996.
DOI : 10.1093/bioinformatics/12.2.95

Y. Loewenstein, D. Raimondo, O. Redfern, J. Watson, D. Frishman et al., Protein function annotation by homology-based inference, Genome Biology, vol.10, issue.2, 2009.
DOI : 10.1186/gb-2009-10-2-207

J. Gough, K. Karplus, R. Hughey, and C. Chothia, Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure, Journal of Molecular Biology, vol.313, issue.4, pp.903-919, 2001.
DOI : 10.1006/jmbi.2001.5080

G. Yona and M. Levitt, Within the twilight zone: a sensitive profile-profile comparison tool based on information theory, Journal of Molecular Biology, vol.315, issue.5, pp.1257-1275, 2002.
DOI : 10.1006/jmbi.2001.5293

B. Brandt and J. Heringa, webPRC: the Profile Comparer for alignment-based searching of public domain databases, Nucleic Acids Research, vol.37, issue.Web Server, pp.48-52, 2009.
DOI : 10.1093/nar/gkp279

R. Sadreyev, D. Baker, and N. Grishin, Profile-profile comparisons by COMPASS predict intricate homologies between protein families, Protein Science, vol.28, issue.Suppl 5, pp.2262-2272, 2003.
DOI : 10.1110/ps.03197403

M. Wistrand and E. Sonnhammer, Improving Profile HMM Discrimination by Adapting Transition Probabilities, Journal of Molecular Biology, vol.338, issue.4, pp.847-854, 2004.
DOI : 10.1016/j.jmb.2004.03.023

J. Bernardes, A. Carbone, and G. Zaverucha, A discriminative method for family-based protein remote homology detection that combines inductive logic programming and propositional models, BMC Bioinformatics, vol.12, issue.1, p.21429187, 2011.
DOI : 10.1093/bioinformatics/14.9.755

URL : https://hal.archives-ouvertes.fr/hal-00684137

M. Remmert, A. Biegert, A. Hauser, and J. Soding, HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment, Nature Methods, vol.11, issue.2, pp.173-175, 2011.
DOI : 10.1006/jmbi.1993.1626

J. Gough, K. Karplus, R. Hughey, and C. Chothia, Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure, Journal of Molecular Biology, vol.313, issue.4, pp.903-919, 2001.
DOI : 10.1006/jmbi.2001.5080

H. Mi, A. Muruganujan, and P. Thomas, PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees, Nucleic Acids Research, vol.41, issue.D1, pp.377-386, 1118.
DOI : 10.1093/nar/gks1118

J. Lees, C. Yeats, J. Perkins, I. Sillitoe, R. Rentzsch et al., Gene3D: a domain-based resource for comparative genomics, functional annotation and protein network analysis, Nucleic Acids Research, vol.40, issue.D1, pp.465-471
DOI : 10.1093/nar/gkr1181

N. Fox and S. Brenner, SCOPe: Structural Classification of Proteins???extended, integrating SCOP and ASTRAL data and classification of new structures, Nucleic Acids Research, vol.42, issue.D1, pp.304-309, 1240.
DOI : 10.1093/nar/gkt1240

J. Lees, D. Lee, R. Studer, N. Dawson, I. Sillitoe et al., Gene3D: Multi-domain annotations for protein sequence and comparative genome analysis, Nucleic Acids Research, vol.42, issue.D1, pp.240-245, 2014.
DOI : 10.1093/nar/gkt1205

C. Yeats, O. Redfern, and C. Orengo, A fast and automated solution for accurately resolving protein domain architectures, Bioinformatics, vol.26, issue.6, pp.745-751, 2010.
DOI : 10.1093/bioinformatics/btq034

N. Terrapon, O. Gascuel, E. Marechal, and L. Bréhélin, Detection of new protein domains using co-occurrence: application to Plasmodium falciparum, Bioinformatics, vol.25, issue.23, pp.3077-3083, 2009.
DOI : 10.1093/bioinformatics/btp560

URL : https://hal.archives-ouvertes.fr/lirmm-00431171

E. Bischoff and C. Vaquero, In silico and biological survey of transcription-associated proteins implicated in the transcriptional machinery during the erythrocytic development of Plasmodium falciparum, BMC Genomics, vol.11, issue.1, 2010.
DOI : 10.1186/1471-2164-11-34

URL : https://hal.archives-ouvertes.fr/pasteur-00663529

A. Ochoa, M. Llinás, and M. Singh, Using context to improve protein domain identification, BMC Bioinformatics, vol.12, issue.1, p.21453511, 2011.
DOI : 10.1073/pnas.87.6.2264

J. Bernardes, F. Vieira, G. Zaverucha, and A. Carbone, A multi-objective optimization approach accurately resolves protein domain architectures, Bioinformatics, vol.32, issue.3, pp.345-353, 2016.
DOI : 10.1093/bioinformatics/btv582

URL : https://hal.archives-ouvertes.fr/hal-01285556

B. Boser, I. Guyon, and V. Vapnik, A training algorithm for optimal margin classifiers, Proceedings of the fifth annual workshop on Computational learning theory , COLT '92, pp.144-152, 1992.
DOI : 10.1145/130385.130401

B. Rost, Twilight zone of protein sequence alignments, Protein Engineering Design and Selection, vol.12, issue.2, pp.85-94, 1999.
DOI : 10.1093/protein/12.2.85

C. Aurrecoechea, J. Brestelli, B. Brunk, J. Dommer, S. Fischer et al., PlasmoDB: a functional genomic database for malaria parasites, Nucleic Acids Research, vol.37, issue.Database, pp.539-543, 2009.
DOI : 10.1093/nar/gkn814

S. Date and C. Stoeckert, Computational modeling of the Plasmodiumfalciparum interactome reveals protein functionon a genome-wide scale, Genome Research, vol.16, issue.4, pp.542-549, 2006.
DOI : 10.1101/gr.4573206

F. Lu, H. Jiang, J. Ding, J. Mu, J. Valenzuela et al., cDNA sequences reveal considerable gene prediction inaccuracy in the Plasmodium falciparum genome, BMC Genomics, vol.8, issue.1, pp.255-17662120, 2007.
DOI : 10.1186/1471-2164-8-255

S. Eddy, Accelerated Profile HMM Searches, PLoS Computational Biology, vol.21, issue.10, 2011.
DOI : 10.1371/journal.pcbi.1002195.g006

C. Vogel, C. Berzuini, M. Bashton, J. Gough, and S. Teichmann, Supra-domains: Evolutionary Units Larger than Single Protein Domains, Journal of Molecular Biology, vol.336, issue.3, pp.809-823, 2004.
DOI : 10.1016/j.jmb.2003.12.026

L. Coin, A. Bateman, and R. Durbin, Enhanced protein domain discovery by using language modeling techniques from speech recognition, Proceedings of the National Academy of Sciences, vol.100, issue.8, pp.4516-4520, 2003.
DOI : 10.1073/pnas.0737502100

A. Moore, A. Björklund, D. Ekman, E. Bornberg-bauer, and A. Elofsson, Arrangements in the modular evolution of proteins, Trends in Biochemical Sciences, vol.33, issue.9, pp.444-451, 2008.
DOI : 10.1016/j.tibs.2008.05.008

E. Marcotte, M. Pellegrini, H. Ng, D. Rice, T. Yeates et al., Detecting Protein Function and Protein-Protein Interactions from Genome Sequences, Science, vol.285, issue.5428, pp.751-753, 1999.
DOI : 10.1126/science.285.5428.751

G. Apic, J. Gough, and S. Teichmann, Domain combinations in archaeal, eubacterial and eukaryotic proteomes, Journal of Molecular Biology, vol.310, issue.2, pp.311-325, 2001.
DOI : 10.1006/jmbi.2001.4776

S. Wuchty and E. Almaas, Evolutionary cores of domain co-occurrence networks, BMC Evol Biol, vol.5, p.15788102, 2005.

A. Murzin, S. Brenner, T. Hubbard, and C. Chothia, SCOP: A structural classification of proteins database for the investigation of sequences and structures, Journal of Molecular Biology, vol.247, issue.4, pp.536-540, 1995.
DOI : 10.1016/S0022-2836(05)80134-2

J. Soeding, Protein homology detection by HMM-HMM comparison, Bioinformatics, vol.21, issue.7, pp.951-960, 2005.
DOI : 10.1093/bioinformatics/bti125

M. Gouy, S. Guindon, and O. Gascuel, SeaView Version 4: A Multiplatform Graphical User Interface for Sequence Alignment and Phylogenetic Tree Building, Molecular Biology and Evolution, vol.27, issue.2, pp.221-224, 2010.
DOI : 10.1093/molbev/msp259

URL : https://hal.archives-ouvertes.fr/lirmm-00705187

P. Keeling, G. Burger, D. Durnford, B. Lang, R. Lee et al., The tree of eukaryotes, Trends in Ecology & Evolution, vol.20, issue.12, pp.670-676, 2005.
DOI : 10.1016/j.tree.2005.09.005

M. Rehmsmeier and M. Vingron, Phylogenetic information improves homology detection, Proteins: Structure, Function, and Genetics, vol.49, issue.4, pp.360-371, 2001.
DOI : 10.1002/prot.1156

R. Finn, J. Mistry, B. Schuster-bockler, S. Griffiths-jones, V. Hollich et al., Pfam: clans, web tools and services, Nucleic Acids Research, vol.34, issue.90001, pp.247-251, 2005.
DOI : 10.1093/nar/gkj149

B. Mirkin, T. Fenner, M. Galperin, and E. Koonin, Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes, BMC evolutionary biology, vol.3, issue.2, p.12515582, 2003.

S. Yang and P. Bourne, The Evolutionary History of Protein Domains Viewed by Species Phylogeny, PLoS ONE, vol.4, issue.12, 2009.
DOI : 10.1371/journal.pone.0008378.s006

A. Bjorklund, D. Ekman, S. Light, J. Frey-skott, and A. Elofsson, Domain Rearrangements in Protein Evolution, Journal of Molecular Biology, vol.353, issue.4, pp.911-923, 2005.
DOI : 10.1016/j.jmb.2005.08.067

S. Pasek, J. Risler, and P. Brezellec, Gene fusion/fission is a major contributor to evolution of multi-domain bacterial proteins, Bioinformatics, vol.22, issue.12, pp.1418-1423, 2006.
DOI : 10.1093/bioinformatics/btl135

M. Pellegrini, E. Marcotte, M. Thompson, D. Eisenberg, and T. Yeates, Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles, Proceedings of the National Academy of Sciences, vol.96, issue.8, pp.4285-4288, 1999.
DOI : 10.1073/pnas.96.8.4285

S. Altschul, T. Madden, A. Schaffer, J. Zhang, Z. Zhang et al., Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic Acids Research, vol.25, issue.17, pp.3389-3402, 1997.
DOI : 10.1093/nar/25.17.3389

E. Frank, Y. Wang, S. Inglis, G. Holmes, and I. Witten, Using Model Trees for Classification, Machine Learning, vol.32, issue.1, pp.63-76, 1998.
DOI : 10.1023/A:1007421302149

W. Mclaughlin, K. Chen, T. Hou, and W. Wang, On the detection of functionally coherent groups of protein domains with an extension to protein annotation, BMC Bioinformatics, vol.8, issue.1, pp.390-17937820, 2007.
DOI : 10.1186/1471-2105-8-390

M. Scott, D. Thomas, and M. Hallett, Predicting Subcellular Localization via Protein Motif Co-Occurrence, Genome Research, vol.14, issue.10a
DOI : 10.1101/gr.2650004

A. Rolf, A. Bairoch, C. Wu, W. Barker, B. Boeckmann et al., UniProt: the Universal Protein knowledgebase, Nucleic Acids Research, vol.32, pp.115-119, 2004.

P. Brazdil, C. Giraud-carrier, C. Soares, and R. Vilalta, Metalearning, 2009.
DOI : 10.1007/978-1-4899-7502-7_543-1

D. Wolpert, Stacked generalization, Neural Networks, vol.5, issue.2, pp.241-259, 1992.
DOI : 10.1016/S0893-6080(05)80023-1

J. Platt, N. Cristianini, and J. Shawe-taylor, Large margin dags for multiclass classification, In: Advances in Neural Information Processing Systems, vol.12, pp.547-553, 2000.

P. Stothard, The Sequence Manipulation Suite: JavaScript programs for analyzing and formatting protein and DNA sequences, Biotechniques, vol.28, pp.1102-1104, 2000.

J. Platt, Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods Advances in Large Margin Classifiers, pp.61-74, 1999.

A. Anand, G. Pugalenthi, and P. Suganthan, Predicting protein structural class by SVM with class-wise optimized features and decision probabilities, Journal of Theoretical Biology, vol.253, issue.2, pp.375-380, 2008.
DOI : 10.1016/j.jtbi.2008.02.031

P. Domingos and M. Pazzani, On the optimality of the simple Bayesian classifier under zero-one loss, Machine Learning, vol.29, issue.2/3, pp.103-130, 1997.
DOI : 10.1023/A:1007413511361

C. Chang and C. Lin, LIBSVM, ACM Transactions on Intelligent Systems and Technology, vol.2, issue.3, pp.1-27, 2011.
DOI : 10.1145/1961189.1961199

J. Thompson, D. Higgins, and T. Gibson, CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice, Nucleic Acids Research, vol.22, issue.22, pp.4673-4680, 1994.
DOI : 10.1093/nar/22.22.4673

M. Larkin, G. Blackshields, N. Brown, R. Chenna, P. Mcgettigan et al., Clustal W and Clustal X version 2.0, Bioinformatics, vol.23, issue.21, pp.2947-2948, 2007.
DOI : 10.1093/bioinformatics/btm404

URL : https://hal.archives-ouvertes.fr/hal-00206210