Fast-SG: an alignment-free algorithm for hybrid assembly

Abstract : Background: Long-read sequencing technologies are the ultimate solution for genome repeats, allowing near reference-level reconstructions of large genomes. However, long-read de novo assembly pipelines are computationally intense and require a considerable amount of coverage, thereby hindering their broad application to the assembly of large genomes. Alternatively, hybrid assembly methods that combine short-and long-read sequencing technologies can reduce the time and cost required to produce de novo assemblies of large genomes. Results: Here, we propose a new method, called Fast-SG, that uses a new ultrafast alignment-free algorithm specifically designed for constructing a scaffolding graph using lightweight data structures. Fast-SG can construct the graph from either short or long reads. This allows the reuse of efficient algorithms designed for short-read data and permits the definition of novel modular hybrid assembly pipelines. Using comprehensive standard datasets and benchmarks, we show how Fast-SG outperforms the state-of-the-art short-read aligners when building the scaffolding graph and can be used to extract linking information from either raw or error-corrected long reads. We also show how a hybrid assembly approach using Fast-SG with shallow long-read coverage (5X) and moderate computational resources can produce long-range and accurate reconstructions of the genomes of Arabidopsis thaliana (Ler-0) and human (NA12878). Conclusions: Fast-SG opens a door to achieve accurate hybrid long-range reconstructions of large genomes with low effort, high portability, and low cost.
Document type :
Journal articles
Complete list of metadatas

Cited literature [49 references]  Display  Hide  Download

https://hal.inria.fr/hal-01842462
Contributor : Marie-France Sagot <>
Submitted on : Wednesday, July 18, 2018 - 12:21:58 PM
Last modification on : Thursday, November 21, 2019 - 1:14:17 PM
Long-term archiving on : Friday, October 19, 2018 - 5:38:04 PM

File

giy048.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Alex Genova, Gonzalo Ruz, Marie-France Sagot, Alejandro Maass. Fast-SG: an alignment-free algorithm for hybrid assembly. GigaScience, BioMed Central, 2018, 7 (5), pp.1 - 15. ⟨10.1093/gigascience/giy048⟩. ⟨hal-01842462⟩

Share

Metrics

Record views

386

Files downloads

53