Improvement of the assembly of heterozygous genomes of non-model organisms, a case study of the genomes of two Spodoptera frugiperda host strains

Abstract : The extraction of biological information from the draft genomes of non-model organisms may result in unattainable, incomplete, or even wrong conclusions. In particular, the combination of a high level of heterozygosity and short reads sequencing may have major impact in the annotation of genes [1,2]. This wrong gene content assessment is usually the consequence of the high fragmentation of the genome sequence but it may also come from an overestimation of the genome size. The latter because the assembly of an heterozygous region for which there is a significant divergence between the two haplotypes leads sometimes to the construction of two different contigs, instead of one consensus sequence. To date, new assemblers such as Platanus [3], have been developed in regard to heterozygous data. But, the complete re-assembly of a genome leading to new automatic and manual annotations process is very cost-effective, and may still produce erroneous scaffolds and annotations. Thus, we set up a « soft » method to detect and correct false duplications due to heterozygosity in draft assemblies. In addition, to the identification and removal of the allelic regions (i.e. unmerged haplotypes), our protocol is able to relocate and merge supernumerary gene annotations. We applied this method as a pre-requisite for the comparison of the genomes of 2 Spodoptera frugiperda (Lepidoptera: Noctuidae) strains, in the frame of the WGS project supported by the Fall armyworm International Public Consortium (FAW-IPC). This moth is a well-known pest of crops throughout the Western hemisphere. This species consists of two strains adapted to different larval host-plants: the first feeds preferentially on corn, cotton and sorghum whereas the second is more associated with rice and several pasture grasses. While, the paired-end reads of the rice-variant have been directly assembled using Platanus [3], we cleaned up and corrected the first release of the corn-variant, leading to a drastic reduction of the genome assembly, with the removal of 88Mbp (17%) and the increase of the N50 from 39,593 to 52,781bp. The suppressed fragments included 3,746 gene predictions; about 80% of them have been either relocated or merged with their complementary allele. Subsequently, in order to identify new candidate genes or genomic regions involved in the host-plant adaptation, we compared the genomes and proteomes of the 2 different strains in order to identify orthologous genes, collinear regions and genome rearrangements, taking into consideration the inflated occurrence of splitted genes due to the high fragmentation of the genome.
Type de document :
Poster
Arthropod Genomics 2015, Jun 2015, Manhattan (Kansas), United States. 2015
Liste complète des métadonnées

https://hal.inria.fr/hal-01240443
Contributeur : Fabrice Legeai <>
Soumis le : mercredi 9 décembre 2015 - 11:32:03
Dernière modification le : mercredi 16 mai 2018 - 11:23:53
Document(s) archivé(s) le : jeudi 10 mars 2016 - 13:13:01

Fichier

AGS15_poster.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01240443, version 1

Citation

Anaïs Gouin, Anthony Bretaudeau, K Labadie, Jean-Marc Aury, Emmanuelle D'Alençon, et al.. Improvement of the assembly of heterozygous genomes of non-model organisms, a case study of the genomes of two Spodoptera frugiperda host strains. Arthropod Genomics 2015, Jun 2015, Manhattan (Kansas), United States. 2015. 〈hal-01240443〉

Partager

Métriques

Consultations de la notice

870

Téléchargements de fichiers

83