, Alignment-Free Algorithm for Hybrid Assembly Overall, we believe that Fast-SG opens the door to achieve accurate hybrid long-range reconstructions of large genomes with low effort, high portability, and low cost. References 1. Pop M. Genome assembly reborn: recent computational challenges, Briefings in Bioinformatics, vol.10, issue.4, pp.354-366, 2009.

T. Treangen and S. Salzberg, Repetitive DNA and next-generation sequencing: computational challenges and solutions, Nature Reviews Genetics, vol.18, issue.1, pp.36-46, 2011.
DOI : 10.1093/dnares/dsq028

M. Hunt, C. Newbold, and M. Berriman, A comprehensive evaluation of assembly scaffolding tools, Genome Biology, vol.15, issue.3, p.42, 2014.
DOI : 10.1186/gb-2004-5-2-r12

S. Koren, B. Walenz, and K. Berlin, Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation, Genome Res, vol.27, issue.5, pp.722-736, 2017.
DOI : 10.1101/071282

J. Kececioglu and E. Myers, Combinatorial algorithms for DNA sequence assembly, Algorithmica, vol.5, issue.7, p.7, 1995.
DOI : 10.1145/321420.321431

P. Pevzner, H. Tang, and M. Waterman, An Eulerian path approach to DNA fragment assembly, Proceedings of the National Academy of Sciences, vol.291, issue.5507, pp.9748-9753, 2001.
DOI : 10.1126/science.1058040

D. Huson, K. Reinert, and E. Myers, The greedy path-merging algorithm for contig scaffolding, Journal of the ACM, vol.49, issue.5, pp.603-615, 2002.
DOI : 10.1145/585265.585267

B. Langmead, C. Trapnell, and M. Pop, Ultrafast and memory-efficient alignment of short DNA sequences to the human genome, Genome Biology, vol.10, issue.3, p.25, 2009.
DOI : 10.1186/gb-2009-10-3-r25

H. Li and R. Durbin, Fast and accurate short read alignment with Burrows-Wheeler transform, Bioinformatics, vol.9, issue.11, pp.1754-1760, 2009.
DOI : 10.1186/1471-2105-9-128

B. Langmead and S. Salzberg, Fast gapped-read alignment with Bowtie 2, Nature Methods, vol.9, issue.4, pp.357-359, 2012.
DOI : 10.1093/bioinformatics/btp352

S. Gao, D. Bertrand, B. Chia, and N. Nagarajan, OPERA-LG: efficient and exact scaffolding of large, repeat-rich eukaryotic genomes with performance guarantees, Genome Biology, vol.25, issue.5, p.102, 2016.
DOI : 10.1093/bioinformatics/btp324

K. Sahlin, R. Chikhi, and L. Arvestad, Assembly scaffolding with PE-contaminated mate-pair libraries, Bioinformatics, vol.32, issue.13, pp.1925-1932, 2016.
DOI : 10.1093/bioinformatics/btt476

URL : https://hal.archives-ouvertes.fr/hal-01396904

I. Mandric and A. Zelikovsky, ScaffMatch: scaffolding algorithm based on maximum weight matching, Bioinformatics, vol.31, issue.16, pp.2632-2638, 2015.
DOI : 10.1101/gr.074492.107

J. Luo, J. Wang, and Z. Zhang, BOSS: a novel scaffolding algorithm based on an optimized scaffold graph, Bioinformatics, vol.22, issue.2, pp.169-176, 2017.
DOI : 10.1101/gr.074492.107

R. Roberts, M. Carneiro, and M. Schatz, The advantages of SMRT sequencing, Genome Biology, vol.11, issue.6, p.405, 2013.
DOI : 10.1186/1471-2105-11-21

URL : https://doi.org/10.1186/gb-2013-14-7-405

C. Chin, D. Alexander, and P. Marks, Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data, Nature Methods, vol.472, issue.6, pp.563-569, 2013.
DOI : 10.1016/S0076-6879(10)72001-2

M. Jain, S. Koren, and J. Quick, Nanopore sequencing and assembly of a human genome with ultra-long reads, Nature Biotechnology, 2018.
DOI : 10.1038/nbt.4060

URL : https://www.nature.com/articles/nbt.4060.pdf

D. Jarvis, Y. Ho, and D. Lightfoot, The genome of Chenopodium quinoa, Nature, vol.38, issue.7641, pp.307-312, 2017.
DOI : 10.1093/nar/gkq366

URL : http://www.nature.com/nature/journal/v542/n7641/pdf/nature21370.pdf

J. Seo, A. Rhie, and J. Kim, De novo assembly and phasing of a Korean human genome, Nature, vol.16, issue.7624, pp.243-247, 2016.
DOI : 10.1186/s13059-015-0762-6

URL : http://www.nature.com/nature/journal/v538/n7624/pdf/nature20098.pdf

N. Putnam, O. Connell, B. Stites, and J. , Chromosome-scale shotgun assembly using an in vitro method for long-range linkage, Genome Research, vol.26, issue.3, pp.342-350, 2016.
DOI : 10.1101/gr.193474.115

URL : http://genome.cshlp.org/content/26/3/342.full.pdf

G. Zheng, B. Lau, and M. Schnall-levin, Haplotyping germline and cancer genomes with high-throughput linked-read sequencing, Nature Biotechnology, vol.34, issue.3, pp.303-311, 2016.
DOI : 10.1101/GR.229202. ARTICLE PUBLISHED ONLINE BEFORE MARCH 2002

URL : http://europepmc.org/articles/pmc4786454?pdf=render

N. Weisenfeld, V. Kumar, and P. Shah, Direct determination of diploid genome sequences, Genome Research, vol.12, issue.5, pp.757-767, 2017.
DOI : 10.1038/nbt.3432

URL : http://genome.cshlp.org/content/27/5/757.full.pdf

L. Williams, D. Tabbaa, and N. Li, Paired-end sequencing of Fosmid libraries by Illumina, Genome Research, vol.22, issue.11, pp.2241-2249, 2012.
DOI : 10.1101/gr.138925.112

URL : http://genome.cshlp.org/content/22/11/2241.full.pdf

C. Wu, R. Ye, and S. Jasinovica, Long-span, mate-pair scaffolding and other methods for faster next-generation sequencing library creation, Nature Methods, vol.9, issue.9, 2012.
DOI : 10.1038/nmeth.f.358

URL : http://www.nature.com/articles/nmeth.f.358.pdf

S. Vinga and J. Almeida, Alignment-free sequence comparison--a review, Bioinformatics, vol.19, issue.4, pp.513-523, 2003.
DOI : 10.1093/bioinformatics/btg005

URL : https://academic.oup.com/bioinformatics/article-pdf/19/4/513/581397/btg005.pdf

S. Salzberg, A. Phillippy, and A. Zimin, GAGE: A critical evaluation of genome assemblies and assembly algorithms, Genome Research, vol.22, issue.3
DOI : 10.1101/gr.131383.111

URL : http://genome.cshlp.org/content/22/3/557.full.pdf

, Genome Res, 2011.

A. Limasset, G. Rizk, and R. Chikhi, Fast and scalable minimal perfect hashing for massive key sets arXiv:170203154 [cs]. 2017 Feb, pp.1702-03154

C. Marchet, L. Lecompte, and A. Limasset, A resource-frugal probabilistic dictionary and applications in bioinformatics arXiv:170300667 [cs, q-bio], pp.1703-00667, 2017.

M. Kokot, M. Dlugosz, and S. Deorowicz, KMC 3: counting and manipulating k-mer statistics, Bioinformatics, vol.3, issue.17, pp.2759-2761
DOI : 10.1093/bioinformatics/btx304

R. Karp and M. Rabin, Efficient randomized pattern-matching algorithms, IBM Journal of Research and Development, vol.31, issue.2, pp.249-260, 1987.
DOI : 10.1147/rd.312.0249

H. Mohamadi, J. Chu, and B. Vandervalk, ntHash: recursive nucleotide hashing, Bioinformatics, vol.32, issue.22, pp.3492-3494, 2016.
DOI : 10.1016/j.csl.2009.12.001

Z. Ning, A. Cox, and J. Mullikin, SSAHA: A Fast Search Method for Large DNA Databases, Genome Research, vol.11, issue.10, pp.1725-1729, 2001.
DOI : 10.1101/gr.194201

L. Salmela and E. Rivals, LoRDEC: accurate and efficient long read error correction, Bioinformatics, vol.18, issue.24, pp.3506-3514, 2014.
DOI : 10.1101/gr.074492.107

URL : https://hal.archives-ouvertes.fr/lirmm-01100451

A. Zimin, D. Puiu, and M. Luo, Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the mega-reads algorithm, Genome Res, 2017.
DOI : 10.1101/066100

H. Li, B. Handsaker, and A. Wysoker, The Sequence Alignment/Map format and SAMtools, Bioinformatics, vol.9, issue.11, pp.2078-2079, 2009.
DOI : 10.1146/annurev.genom.9.081307.164359

H. Li, Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv:13033997 [q-bio], pp.1303-3997, 2013.

N. Weisenfeld, S. Yin, and T. Sharpe, Comprehensive variation discovery in single human genomes, Nature Genetics, vol.431, issue.12, pp.1350-1355, 2014.
DOI : 10.1101/gr.7337908

S. Kurtz, A. Phillippy, and A. Delcher, Versatile and open software for comparing large genomes, Genome Biology, vol.5, issue.2, p.12, 2004.
DOI : 10.1186/gb-2004-5-2-r12

R. Warren, C. Yang, and B. Vandervalk, LINKS: Scalable, alignment-free scaffolding of draft genomes with long reads, GigaScience, vol.23, issue.1, p.35, 2015.
DOI : 10.1093/bioinformatics/btl629

V. Schneider, T. Graves-lindsay, and K. Howe, Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly, Genome Research, vol.16, issue.5, pp.849-864, 2017.
DOI : 10.1038/sdata.2016.25

A. Bankevich, S. Nurk, and D. Antipov, SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing, Journal of Computational Biology, vol.19, issue.5, pp.455-477, 2012.
DOI : 10.1089/cmb.2012.0021

S. Jackman, B. Vandervalk, and H. Mohamadi, ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter, Genome Research, vol.226, issue.5, pp.768-777, 2017.
DOI : 10.1038/sdata.2016.25

E. Myers, G. Sutton, and A. Delcher, A Whole-Genome Assembly of Drosophila, Science, vol.287, issue.5461, pp.2196-2204, 2000.
DOI : 10.1126/science.287.5461.2196

S. Gnerre, I. Maccallum, and D. Przybylski, High-quality draft assemblies of mammalian genomes from massively parallel sequence data, Proceedings of the National Academy of Sciences, vol.462, issue.7269, pp.1513-1518, 2011.
DOI : 10.1038/462021a

C. Lee, Generating consensus sequences from partial order multiple sequence alignment graphs, Bioinformatics, vol.19, issue.8, pp.999-1008, 2003.
DOI : 10.1093/bioinformatics/btg109

D. Genova, A. Ruz, G. Sagot, and M. , Software and supporting data for " Fast-SG: An alignment-free algorithm for hybrid assembly, GigaScience Database, 2018.

B. Walker, T. Abeel, and T. Shea, Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement, PLoS ONE, vol.24, issue.11, 2014.
DOI : 10.1371/journal.pone.0112963.s012

T. Masurca and G. Assembler, 06/masurca-assembly-of-na12878-low.html, Accessed 12 50 The Fast-SG wiki, https://github.com/adigenova/fast-sg/w iki/Hybrid-scaffolding-of-NA12878, 2017.