, Alignment-Free Algorithm for Hybrid Assembly Overall, we believe that Fast-SG opens the door to achieve accurate hybrid long-range reconstructions of large genomes with low effort, high portability, and low cost. References 1. Pop M. Genome assembly reborn: recent computational challenges, Briefings in Bioinformatics, vol.10, issue.4, pp.354-366, 2009.
Repetitive DNA and next-generation sequencing: computational challenges and solutions, Nature Reviews Genetics, vol.18, issue.1, pp.36-46, 2011. ,
DOI : 10.1093/dnares/dsq028
A comprehensive evaluation of assembly scaffolding tools, Genome Biology, vol.15, issue.3, p.42, 2014. ,
DOI : 10.1186/gb-2004-5-2-r12
Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation, Genome Res, vol.27, issue.5, pp.722-736, 2017. ,
DOI : 10.1101/071282
Combinatorial algorithms for DNA sequence assembly, Algorithmica, vol.5, issue.7, p.7, 1995. ,
DOI : 10.1145/321420.321431
An Eulerian path approach to DNA fragment assembly, Proceedings of the National Academy of Sciences, vol.291, issue.5507, pp.9748-9753, 2001. ,
DOI : 10.1126/science.1058040
The greedy path-merging algorithm for contig scaffolding, Journal of the ACM, vol.49, issue.5, pp.603-615, 2002. ,
DOI : 10.1145/585265.585267
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome, Genome Biology, vol.10, issue.3, p.25, 2009. ,
DOI : 10.1186/gb-2009-10-3-r25
Fast and accurate short read alignment with Burrows-Wheeler transform, Bioinformatics, vol.9, issue.11, pp.1754-1760, 2009. ,
DOI : 10.1186/1471-2105-9-128
Fast gapped-read alignment with Bowtie 2, Nature Methods, vol.9, issue.4, pp.357-359, 2012. ,
DOI : 10.1093/bioinformatics/btp352
OPERA-LG: efficient and exact scaffolding of large, repeat-rich eukaryotic genomes with performance guarantees, Genome Biology, vol.25, issue.5, p.102, 2016. ,
DOI : 10.1093/bioinformatics/btp324
Assembly scaffolding with PE-contaminated mate-pair libraries, Bioinformatics, vol.32, issue.13, pp.1925-1932, 2016. ,
DOI : 10.1093/bioinformatics/btt476
URL : https://hal.archives-ouvertes.fr/hal-01396904
ScaffMatch: scaffolding algorithm based on maximum weight matching, Bioinformatics, vol.31, issue.16, pp.2632-2638, 2015. ,
DOI : 10.1101/gr.074492.107
BOSS: a novel scaffolding algorithm based on an optimized scaffold graph, Bioinformatics, vol.22, issue.2, pp.169-176, 2017. ,
DOI : 10.1101/gr.074492.107
The advantages of SMRT sequencing, Genome Biology, vol.11, issue.6, p.405, 2013. ,
DOI : 10.1186/1471-2105-11-21
URL : https://doi.org/10.1186/gb-2013-14-7-405
Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data, Nature Methods, vol.472, issue.6, pp.563-569, 2013. ,
DOI : 10.1016/S0076-6879(10)72001-2
Nanopore sequencing and assembly of a human genome with ultra-long reads, Nature Biotechnology, 2018. ,
DOI : 10.1038/nbt.4060
URL : https://www.nature.com/articles/nbt.4060.pdf
The genome of Chenopodium quinoa, Nature, vol.38, issue.7641, pp.307-312, 2017. ,
DOI : 10.1093/nar/gkq366
URL : http://www.nature.com/nature/journal/v542/n7641/pdf/nature21370.pdf
De novo assembly and phasing of a Korean human genome, Nature, vol.16, issue.7624, pp.243-247, 2016. ,
DOI : 10.1186/s13059-015-0762-6
URL : http://www.nature.com/nature/journal/v538/n7624/pdf/nature20098.pdf
Chromosome-scale shotgun assembly using an in vitro method for long-range linkage, Genome Research, vol.26, issue.3, pp.342-350, 2016. ,
DOI : 10.1101/gr.193474.115
URL : http://genome.cshlp.org/content/26/3/342.full.pdf
Haplotyping germline and cancer genomes with high-throughput linked-read sequencing, Nature Biotechnology, vol.34, issue.3, pp.303-311, 2016. ,
DOI : 10.1101/GR.229202. ARTICLE PUBLISHED ONLINE BEFORE MARCH 2002
URL : http://europepmc.org/articles/pmc4786454?pdf=render
Direct determination of diploid genome sequences, Genome Research, vol.12, issue.5, pp.757-767, 2017. ,
DOI : 10.1038/nbt.3432
URL : http://genome.cshlp.org/content/27/5/757.full.pdf
Paired-end sequencing of Fosmid libraries by Illumina, Genome Research, vol.22, issue.11, pp.2241-2249, 2012. ,
DOI : 10.1101/gr.138925.112
URL : http://genome.cshlp.org/content/22/11/2241.full.pdf
Long-span, mate-pair scaffolding and other methods for faster next-generation sequencing library creation, Nature Methods, vol.9, issue.9, 2012. ,
DOI : 10.1038/nmeth.f.358
URL : http://www.nature.com/articles/nmeth.f.358.pdf
Alignment-free sequence comparison--a review, Bioinformatics, vol.19, issue.4, pp.513-523, 2003. ,
DOI : 10.1093/bioinformatics/btg005
URL : https://academic.oup.com/bioinformatics/article-pdf/19/4/513/581397/btg005.pdf
GAGE: A critical evaluation of genome assemblies and assembly algorithms, Genome Research, vol.22, issue.3 ,
DOI : 10.1101/gr.131383.111
URL : http://genome.cshlp.org/content/22/3/557.full.pdf
, Genome Res, 2011.
Fast and scalable minimal perfect hashing for massive key sets arXiv:170203154 [cs]. 2017 Feb, pp.1702-03154 ,
A resource-frugal probabilistic dictionary and applications in bioinformatics arXiv:170300667 [cs, q-bio], pp.1703-00667, 2017. ,
KMC 3: counting and manipulating k-mer statistics, Bioinformatics, vol.3, issue.17, pp.2759-2761 ,
DOI : 10.1093/bioinformatics/btx304
Efficient randomized pattern-matching algorithms, IBM Journal of Research and Development, vol.31, issue.2, pp.249-260, 1987. ,
DOI : 10.1147/rd.312.0249
ntHash: recursive nucleotide hashing, Bioinformatics, vol.32, issue.22, pp.3492-3494, 2016. ,
DOI : 10.1016/j.csl.2009.12.001
SSAHA: A Fast Search Method for Large DNA Databases, Genome Research, vol.11, issue.10, pp.1725-1729, 2001. ,
DOI : 10.1101/gr.194201
LoRDEC: accurate and efficient long read error correction, Bioinformatics, vol.18, issue.24, pp.3506-3514, 2014. ,
DOI : 10.1101/gr.074492.107
URL : https://hal.archives-ouvertes.fr/lirmm-01100451
Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the mega-reads algorithm, Genome Res, 2017. ,
DOI : 10.1101/066100
The Sequence Alignment/Map format and SAMtools, Bioinformatics, vol.9, issue.11, pp.2078-2079, 2009. ,
DOI : 10.1146/annurev.genom.9.081307.164359
Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv:13033997 [q-bio], pp.1303-3997, 2013. ,
Comprehensive variation discovery in single human genomes, Nature Genetics, vol.431, issue.12, pp.1350-1355, 2014. ,
DOI : 10.1101/gr.7337908
Versatile and open software for comparing large genomes, Genome Biology, vol.5, issue.2, p.12, 2004. ,
DOI : 10.1186/gb-2004-5-2-r12
LINKS: Scalable, alignment-free scaffolding of draft genomes with long reads, GigaScience, vol.23, issue.1, p.35, 2015. ,
DOI : 10.1093/bioinformatics/btl629
Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly, Genome Research, vol.16, issue.5, pp.849-864, 2017. ,
DOI : 10.1038/sdata.2016.25
SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing, Journal of Computational Biology, vol.19, issue.5, pp.455-477, 2012. ,
DOI : 10.1089/cmb.2012.0021
ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter, Genome Research, vol.226, issue.5, pp.768-777, 2017. ,
DOI : 10.1038/sdata.2016.25
A Whole-Genome Assembly of Drosophila, Science, vol.287, issue.5461, pp.2196-2204, 2000. ,
DOI : 10.1126/science.287.5461.2196
High-quality draft assemblies of mammalian genomes from massively parallel sequence data, Proceedings of the National Academy of Sciences, vol.462, issue.7269, pp.1513-1518, 2011. ,
DOI : 10.1038/462021a
Generating consensus sequences from partial order multiple sequence alignment graphs, Bioinformatics, vol.19, issue.8, pp.999-1008, 2003. ,
DOI : 10.1093/bioinformatics/btg109
Software and supporting data for " Fast-SG: An alignment-free algorithm for hybrid assembly, GigaScience Database, 2018. ,
Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement, PLoS ONE, vol.24, issue.11, 2014. ,
DOI : 10.1371/journal.pone.0112963.s012
06/masurca-assembly-of-na12878-low.html, Accessed 12 50 The Fast-SG wiki, https://github.com/adigenova/fast-sg/w iki/Hybrid-scaffolding-of-NA12878, 2017. ,