28549 articles – 22058 references  [version française]

hal-00640571, version 1

Genolevures: automated annotation of yeast genome sequences

Tiphaine Martin () a12

Comparative Genomics of Eukaryotic Microorganisms (2011)

Abstract: Genome annotation is subdivided into 2 phases: syntactical annotation, i.e. prediction and location of various chromosomal elements (protein coding genes, tRNA genes), and functional annotation, i.e. each element is assigned a biological function often on the basis of comparison with known sequences. Our automated annotation pipeline integrates predictions of several types of objects and annotates stepwise. (1) Seven different algorithms predict protein coding genes using the same training set which contains coding sequences with and without introns. (2) The contigs are aligned with (a) BLASTn to non-coding elements of reference species, (b) tBLASTn to proteomes of reference species and Uniprot, and (c) PSI-tBLASTn to PSSM representative of protein families. (3) Other chromosomal elements are either predicted by Consortium experts or by specific bioinformatics tools. (4) The overlap conflicts between elements are solved by taking into account predicted gene models, other chromosomal elements, and similarity regions. (5) The resulting elements are then submitted to functional annotation, based on a decision tree inspired by previous semi-automated annotation projects held by the Génolevures Consortium. The functional annotation text of a predicted gene model obeys the same rules as previous Génolevures annotations. This automated annotation pipeline links together widely used bioinformatics tools as well as specific scripts; using data files in standard formats.

  • a –  CNRS
  • 1:  Laboratoire Bordelais de Recherche en Informatique (LaBRI)
  • CNRS : UMR5800 – Université Sciences et Technologies - Bordeaux I – École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB) – Université Victor Segalen - Bordeaux II
  • 2:  Magnome (INRIA Bordeaux - Sud-Ouest)
  • CNRS : UMR5800 – INRIA – Université Sciences et Technologies - Bordeaux I – École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)
  • Domain : Life Sciences/Quantitative Methods
    Computer Science/Bioinformatics
 
  • hal-00640571, version 1
  • oai:hal.inria.fr:hal-00640571
  • From: 
  • Submitted on: Sunday, 13 November 2011 15:58:19
  • Updated on: Sunday, 13 November 2011 16:01:26