Accurate alignment of (meta)barcoding data sets using MACSE - Archive ouverte HAL Access content directly
Book Sections Year : 2020

Accurate alignment of (meta)barcoding data sets using MACSE

(1) , (2)
1
2

Abstract

Twenty years of standardized DNA barcoding practice have resulted in millions of sequences being produced for a handful of molecular markers in a wide range of fungi, animal and plant species. Despite some basic quality controls, reference barcoding data sets deposited in the Bar-code of Life Datasystem (BOLD) database are not immune to sequencing errors and undetected pseudogenes. Such database inaccuracies can significantly bias subsequent species delimitation and biodiversity estimation based on DNA barcoding. These potential problems are amplified in metabarcoding studies containing thousands of sequences produced using high throughput se-quencing technologies. Here, we propose a pipeline based on MACSE v2, an extended version of our codon-aware multiple sequence alignment software accounting for frameshifts and stop codons. The MACSE_BARCODE pipeline allows the accurate alignment of hundreds of thousands of protein-coding barcode sequences. Re-analyses of published data sets confirm that MACSE v2 is able to automatically detect most sequencing errors previously identified manually. The proposed alignment strategy hence alleviates the risk of incorrect species delimitation due the incorporation of sequencing errors or undetected pseudogenes. By applying the MACSE_BARCODE pipeline to mammal, ant, and flowering plant barcode sequences available in BOLD, we highlight several cases of database errors and provide curated reference alignments for the main protein-coding barcode genes. We anticipate our approach to be particularly useful for metabarcoding studies in which thousands of new sequences need to be compared to a reference database for subsequent taxonomic assignment. This might prove particularly helpful for diet characterization studies and large-scale biodiversity assessments through environmental DNA metabarcoding. The new MACSE_BARCODE pipeline is distributed as Nextflow workflows that are available from the MACSE project webpage (https://bioweb.supagro.inra.fr/macse/).
Fichier principal
Vignette du fichier
Delsuc&Ranwez-PhyloBook-2020.pdf (13.5 Mo) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

hal-02541199 , version 1 (13-04-2020)

Identifiers

  • HAL Id : hal-02541199 , version 1

Cite

Frédéric Delsuc, Vincent Ranwez. Accurate alignment of (meta)barcoding data sets using MACSE. Scornavacca, Celine; Delsuc, Frédéric; Galtier, Nicolas. Phylogenetics in the Genomic Era, 2.3, No commercial publisher | Authors open access book, pp.2.3:1--2.3:31, 2020. ⟨hal-02541199⟩
677 View
99 Download

Share

Gmail Facebook Twitter LinkedIn More