Toward perfect reads: self-correction of short reads via mapping on de Bruijn graphs

Antoine Limasset; Jean-François Flot; Pierre Peterlongo

doi:10.1093/bioinformatics/btz102

Article Dans Une Revue Bioinformatics Année : 2019

Toward perfect reads: self-correction of short reads via mapping on de Bruijn graphs

Vers des sequences parfaites: correction de donnée de sequencage de seconde generation via alignement sur graphe de De Bruijn

(1) , (2) , (3)

1
2
3

Antoine Limasset

Fonction : Auteur correspondant
PersonId : 180632
IdHAL : antoine-limasset
ORCID : 0000-0002-0669-4141
IdRef : 223503908

Connectez-vous pour contacter l'auteur

Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189

Jean-François Flot

Fonction : Auteur

Evolutionary Biology and Ecology [Brussels]

Pierre Peterlongo

Fonction : Auteur
PersonId : 171998
IdHAL : pierre-peterlongo
ORCID : 0000-0003-0776-6407
IdRef : 12482062X

Scalable, Optimized and Parallel Algorithms for Genomics

Résumé

Motivations Short-read accuracy is important for downstream analyses such as genome assembly and hybrid long-read correction. Despite much work on short-read correction, present-day correctors either do not scale well on large data sets or consider reads as mere suites of k-mers, without taking into account their full-length read information. Results We propose a new method to correct short reads using de Bruijn graphs, and implement it as a tool called Bcool. As a first step, Bcool constructs a compacted de Bruijn graph from the reads. This graph is filtered on the basis of k-mer abundance then of unitig abundance, thereby removing from most sequencing errors. The cleaned graph is then used as a reference on which the reads are mapped to correct them. We show that this approach yields more accurate reads than k-mer-spectrum correctors while being scalable to human-size genomic datasets and beyond. Availability and Implementation

Domaines

Bio-informatique [q-bio.QM]

Fichier principal

main.pdf (518.07 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Pierre Peterlongo : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-02407243

Soumis le : jeudi 12 décembre 2019-14:06:41

Dernière modification le : mercredi 24 janvier 2024-09:54:23

Archivage à long terme le : vendredi 13 mars 2020-21:50:02

Dates et versions

hal-02407243 , version 1 (12-12-2019)

Identifiants

HAL Id : hal-02407243 , version 1
DOI : 10.1093/bioinformatics/btz102

Citer

Antoine Limasset, Jean-François Flot, Pierre Peterlongo. Toward perfect reads: self-correction of short reads via mapping on de Bruijn graphs. Bioinformatics, 2019, ⟨10.1093/bioinformatics/btz102⟩. ⟨hal-02407243⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 CNRS INRIA INSA-RENNES IRISA CENTRALESUPELEC CRISTAL INRIA2 CRISTAL-BONSAI UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UNIV-LILLE UR1-MATH-NUM

154 Consultations

265 Téléchargements

Toward perfect reads: self-correction of short reads via mapping on de Bruijn graphs

Vers des sequences parfaites: correction de donnée de sequencage de seconde generation via alignement sur graphe de De Bruijn

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager