Accurate alignment of (meta)barcoding data sets using MACSE - Phylogenetics in the Genomic Era Accéder directement au contenu
Chapitre D'ouvrage Année : 2020

Accurate alignment of (meta)barcoding data sets using MACSE

Résumé

Twenty years of standardized DNA barcoding practice have resulted in millions of sequences being produced for a handful of molecular markers in a wide range of fungi, animal and plant species. Despite some basic quality controls, reference barcoding data sets deposited in the Bar-code of Life Datasystem (BOLD) database are not immune to sequencing errors and undetected pseudogenes. Such database inaccuracies can significantly bias subsequent species delimitation and biodiversity estimation based on DNA barcoding. These potential problems are amplified in metabarcoding studies containing thousands of sequences produced using high throughput se-quencing technologies. Here, we propose a pipeline based on MACSE v2, an extended version of our codon-aware multiple sequence alignment software accounting for frameshifts and stop codons. The MACSE_BARCODE pipeline allows the accurate alignment of hundreds of thousands of protein-coding barcode sequences. Re-analyses of published data sets confirm that MACSE v2 is able to automatically detect most sequencing errors previously identified manually. The proposed alignment strategy hence alleviates the risk of incorrect species delimitation due the incorporation of sequencing errors or undetected pseudogenes. By applying the MACSE_BARCODE pipeline to mammal, ant, and flowering plant barcode sequences available in BOLD, we highlight several cases of database errors and provide curated reference alignments for the main protein-coding barcode genes. We anticipate our approach to be particularly useful for metabarcoding studies in which thousands of new sequences need to be compared to a reference database for subsequent taxonomic assignment. This might prove particularly helpful for diet characterization studies and large-scale biodiversity assessments through environmental DNA metabarcoding. The new MACSE_BARCODE pipeline is distributed as Nextflow workflows that are available from the MACSE project webpage (https://bioweb.supagro.inra.fr/macse/).
Fichier principal
Vignette du fichier
Delsuc&Ranwez-PhyloBook-2020.pdf (13.5 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-02541199 , version 1 (13-04-2020)

Identifiants

  • HAL Id : hal-02541199 , version 1

Citer

Frédéric Delsuc, Vincent Ranwez. Accurate alignment of (meta)barcoding data sets using MACSE. Scornavacca, Celine; Delsuc, Frédéric; Galtier, Nicolas. Phylogenetics in the Genomic Era, 2.3, No commercial publisher | Authors open access book, pp.2.3:1--2.3:31, 2020. ⟨hal-02541199⟩
795 Consultations
122 Téléchargements

Partager

Gmail Facebook X LinkedIn More