Genolevures: automated annotation of yeast genome sequences

Tiphaine Martin 1, 2
1 MAGNOME - Models and Algorithms for the Genome
Inria Bordeaux - Sud-Ouest, UB - Université de Bordeaux, CNRS - Centre National de la Recherche Scientifique : UMR5800
Abstract : Genome annotation is subdivided into 2 phases: syntactical annotation, i.e. prediction and location of various chromosomal elements (protein coding genes, tRNA genes), and functional annotation, i.e. each element is assigned a biological function often on the basis of comparison with known sequences. Our automated annotation pipeline integrates predictions of several types of objects and annotates stepwise. (1) Seven different algorithms predict protein coding genes using the same training set which contains coding sequences with and without introns. (2) The contigs are aligned with (a) BLASTn to non-coding elements of reference species, (b) tBLASTn to proteomes of reference species and Uniprot, and (c) PSI-tBLASTn to PSSM representative of protein families. (3) Other chromosomal elements are either predicted by Consortium experts or by specific bioinformatics tools. (4) The overlap conflicts between elements are solved by taking into account predicted gene models, other chromosomal elements, and similarity regions. (5) The resulting elements are then submitted to functional annotation, based on a decision tree inspired by previous semi-automated annotation projects held by the Génolevures Consortium. The functional annotation text of a predicted gene model obeys the same rules as previous Génolevures annotations. This automated annotation pipeline links together widely used bioinformatics tools as well as specific scripts; using data files in standard formats.
Type de document :
Communication dans un congrès
Comparative Genomics of Eukaryotic Microorganisms, Oct 2011, Sant Feliu de Guixols, Spain. 2011
  • HAL Id : hal-00640571, version 1



