Toward perfect reads: self-correction of short reads via mapping on de Bruijn graphs

Antoine Limasset 1, 2 Jean-François Flot 1 Pierre Peterlongo 2
2 GenScale - Scalable, Optimized and Parallel Algorithms for Genomics
Inria Rennes – Bretagne Atlantique , IRISA-D7 - GESTION DES DONNÉES ET DE LA CONNAISSANCE
Abstract : Short-read accuracy is important for downstream analyses such as genome assembly and hybrid long-read correction. Despite much work on short-read correction, present-day correctors either do not scale well on large data sets or consider reads as mere suites of k-mers, without taking into account their full-length read information. We propose a new method to correct short reads using de Bruijn graphs, and implement it as a tool called Bcool. As a first st ep, Bcool constructs a compacted de Bruijn graph from the reads. This graph is filtered on the basis of k-mer abundance then of unitig abundance, thereby removing from most sequencing errors. The cleaned graph is then used as a reference on which the reads are mapped to correct them. We show that this approach yields more accurate reads than k-mer-spectrum correctors while being scalable to human-size genomic datasets and beyond. The implementation is open source and available at http: //github.com/Malfoy/BCOOL under the Affero GPL license.
Document type :
Conference papers
Complete list of metadatas

https://hal.inria.fr/hal-01644163
Contributor : Pierre Peterlongo <>
Submitted on : Wednesday, November 22, 2017 - 8:49:30 AM
Last modification on : Friday, September 13, 2019 - 9:49:21 AM

Links full text

Identifiers

  • HAL Id : hal-01644163, version 1
  • ARXIV : 1711.03336

Citation

Antoine Limasset, Jean-François Flot, Pierre Peterlongo. Toward perfect reads: self-correction of short reads via mapping on de Bruijn graphs. RECOMB 2018, Apr 2018, Paris, France. ⟨hal-01644163⟩

Share

Metrics

Record views

225