,
,
142 6.2.1 Vue d'ensemble ,
, , p.146
, Calcul des piles de chevauchements et des fenêtres, p.147
148 6.2.5.2 Généralisation de la stratégie de segmentation . . 148 6.2.5.3 Raffinement du consensus avec un graphe de de Bruijn, p.151 ,
,
154 6.3.4 Comparaison à l'état de l'art sur données simulées ,
,
, Contexte Les technologies de séquençage de troisième génération ont grandement évolué depuis leur introduction en 2011. En particulier, les taux d'erreurs des reads, atteignant 15 à 30% lors des premières expériences de séquençage, ont été grandement 7.1 Introduction
, 3.1.2 Profondeur de séquençage de 60x, p.192
,
, , p.206
, , p.210
, Ce chapitre présente la conclusion générale de cette thèse, ainsi que ses perspectives, proposées aussi bien à l'échelle des différents résultats décrits
HG-CoLoR : enHanced de Bruijn Graph for the error Correction of Long Reads Seqbio, Informatique et Mathématiques (JOBIM), 2017. ,
CONSENT : Scalable self-correction of long reads with multiple sequence alignment, Informatique et Mathématiques (JOBIM), 2019. ,
, Communications dans des workshops internationaux
Enhanced de Bruijn Graphs, Mathematic foundations in Bioinformatics (MatBio), 2017. ,
Hybrid correction of long reads using a variable-order de Bruijn graph, Data Structures in Bioinformatics (DSB), 2018. ,
, Liste des publications et communications
CONSENT : Scalable self-correction of long reads with multiple sequence alignment, Data Structures in Bioinformatics (DSB), 2019. ,
, Communications dans des workshops nationaux
ELECTOR : EvaLuator of Error Correction Tools for lOng Reads, SeqBio, 2018. ,
HG-CoLoR : enHanced de Bruijn Graph for the error Correction of Long Reads, 2017. ,
, , 2018.
, Conférences et séminaires invité
Correction de données de séquençage de troisième génération. Séminaire d'informatique théorique, 2019. ,
Diverses approches pour l'auto-correction des lectures longues, Séminaire Symbiose de l'Inria, 2017. ,
ELECTOR : EvaLuation of Error Correction Tools for lOng Reads, Informatique et Mathématiques (JOBIM), 2018. ,
,
ELECTOR : Evaluator for long reads correction methods. bioRxiv, 2019. ,
CONSENT : Scalable self-correction of long reads with multiple sequence alignment, 2019. ,
Karect : accurate correction of substitution, insertion and deletion errors for next-generation sequencing data, Bioinformatics, vol.31, pp.3421-3428, 2015. ,
Basic local alignment search tool, Journal of Molecular biology, vol.215, pp.403-413, 1990. ,
ReAligner : A Program for Refining DNA Sequence Multi-Alignments, Journal of Computational Biology, vol.4, pp.369-383, 2009. ,
Improving PacBio Long Read Accuracy by Short Read Alignment, PLoS ONE, vol.7, issue.10, pp.1-8, 2012. ,
SPAdes : A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing, Journal of Computational Biology, vol.19, pp.455-477, 2012. ,
HALC : High throughput algorithm for long read error correction, BMC Bioinformatics, vol.18, p.204, 2017. ,
FLAS : fast and high-throughput algorithm for PacBio long-read self-correction, Bioinformatics, 2019. ,
, An Inequality and Associated Maximization Technique in Statistical Estimation for Probabilistic Functions of Markov Processes. Inequalities III : Proceedings of the Third Symposium on Inequalities, pp.1-8, 1972.
Assembling large genomes with single-molecule sequencing and locality-sensitive hashing, Nature Biotechnology, vol.33, pp.623-630, 2015. ,
Variable-Order De Bruijn Graphs, Proceedings of the 2015 Data Compression Conference, pp.383-392, 2015. ,
Succinct de Bruijn Graphs, Algorithms in Bioinformatics : 12th International Workshop, WABI 2012, pp.225-235, 2012. ,
,
Algorithm 457 : Finding All Cliques of an Undirected Graph, Communications of the ACM, vol.16, pp.575-577, 1973. ,
A combinatorial problem, Proceedings of the Section of Sciences of the Koninklijke Nederlandse Akademie van Wetenschappen te Amsterdam, vol.7, pp.758-764, 1946. ,
A block-sorting lossless data compression algorithm, 1994. ,
Comparison of mapping algorithms used in high-throughput sequencing : Application to Ion Torrent data, BMC Genomics, vol.15, pp.1-16, 2014. ,
Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR) : application and theory, BMC Bioinformatics, vol.13, p.238, 2012. ,
Space-efficient and exact de Bruijn graph representation based on a Bloom filter, Algorithms for Molecular Biology, vol.2, pp.1-9, 2013. ,
Phased diploid genome assembly with singlemolecule real-time sequencing, Nature Methods, vol.13, pp.1050-1054, 2016. ,
finished microbial genome assemblies from long-read SMRT sequencing data, Nature Methods, vol.10, pp.563-569, 2013. ,
HECIL : A hybrid error correction algorithm for long reads with iterative learning, Scientific Reports, vol.8, issue.1, pp.1-9, 2018. ,
The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants, Nucleic Acids Research, vol.38, pp.1767-1771, 2009. ,
SHRiMP2 : Sensitive yet practical short read mapping, Bioinformatics, vol.27, pp.1011-1012, 2011. ,
A Note on Two Problems in Connexion with Graphs, Numerische Mathematik, vol.1, issue.1, pp.269-271, 1959. ,
Listing All Maximal Cliques in Sparse Graphs in Near-Optimal Time. Algorithms and Computation, pp.403-414, 2010. ,
Listing All Maximal Cliques in Large Sparse Real-World Graphs. Experimental Algorithms, pp.364-375, 2011. ,
Base-Calling of Automated Sequencer Traces Using Phred. I. Accuracy Assessment, Genome Research, vol.8, pp.186-194, 1998. ,
Base-Calling of Automated Sequencer Traces Using Phred. II. Error Probabilities, Genome Research, vol.8, pp.186-194, 1998. ,
Opportunistic data structures with applications, Proceedings of the 41st Annual Symposium on Foundations of Computer Science FOCS '00, pp.390-398, 2000. ,
Hercules : a profile HMM-based hybrid error correction algorithm for long reads, Nucleic Acids Research, vol.46, 2018. ,
Algorithm 97 : Shortest Path, Communications of the ACM, vol.5, p.345, 1962. ,
Initial sequencing and analysis of the human genome, Nature, vol.409, pp.860-921, 2001. ,
Normal Recurring Decimals, Journal of the London Mathematical Society, pp.167-169, 1946. ,
Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome, Genome Research, vol.25, pp.1750-1756, 2015. ,
Coming of age : Ten years of next-generation sequencing technologies, Nature Reviews Genetics, vol.17, pp.333-351, 2016. ,
Performance comparison of second and thirdgeneration sequencers using a bacterial genome with two chromosomes, BMC Genomics, vol.15, p.699, 2014. ,
Combining partial order alignment and progressive multiple sequence alignment increases alignment speed and scalability to very large alignment problems, Bioinformatics, vol.20, pp.1546-1556, 2004. ,
The history of DNA sequencing, Journal of Medical Biochemistry, vol.32, pp.301-312, 2013. ,
Proovread : Large-scale high-accuracy PacBio correction through iterative short read consensus, Bioinformatics, vol.30, pp.3004-3011, 2014. ,
Correcting Long Reads by Mapping short reads, Bioinformatics, vol.32, pp.545-551, 2016. ,
Improving quality of high-throughput sequencing reads, 2015. ,
BLESS : Bloom filter-based error correction solution for high-throughput sequencing reads, Bioinformatics, vol.30, pp.1354-1362, 2014. ,
LSCplus : a fast solution for improving long read accuracy by short read alignment, BMC Bioinformatics, vol.17, p.451, 2016. ,
ART : a next-generation sequencing read simulator, Bioinformatics, vol.28, pp.593-594, 2012. ,
HiTEC : Accurate error correction in high-throughput sequencing data, Bioinformatics, vol.27, pp.295-302, 2011. ,
ABySS 2 . 0 : Resource-Efficient Assembly of Large Genomes using a Bloom Filter Effect of Bloom Filter False Positive Rate, Genome Research, vol.27, pp.768-777, 2017. ,
Nanopore sequencing and assembly of a human genome with ultra-long reads, Nature Biotechnology, vol.36, p.338, 2018. ,
ABySS : A parallel assembler for short read sequence data, Genome Research, vol.19, pp.1117-1123, 2009. ,
ECHO : A reference-free short-read error correction algorithm, Genome Research, vol.21, pp.1181-1192, 2011. ,
An Error Correction and De-Novo Assembly Approach for Nanopore Reads Using Short Reads, Current Bioinformatics, vol.13, pp.241-252, 2018. ,
Generations of Sequencing Technologies : From First to Next Generation, 2017. ,
Quake : qualityaware detection and correction of sequencing errors, Genome Biology, vol.11, issue.11, 2010. ,
BLAT -The BLAST-Like Alignment Tool, Genome Research, vol.12, pp.656-664, 2002. ,
Adaptive seeds tame genomic sequence comparison, Genome Research, vol.21, pp.487-493, 2011. ,
KMC3 : counting and manipulating k-mer statistics, Bioinformatics, vol.33, pp.2759-2791, 2017. ,
Reducing assembly complexity of microbial genomes with single-molecule sequencing, Genome Biology, vol.14, issue.9 ,
Hybrid error correction and de novo assembly of singlemolecule sequencing reads, Nature Biotechnology, vol.30, pp.693-700, 2012. ,
Canu : scalable and accurate long-read assembly via adaptive k -mer weighting and repeat separation, Genome Research, vol.27, pp.722-736, 2017. ,
Indexing Arbitrary-Length k-Mers in Sequencing Reads, PLoS ONE, vol.10, pp.1-16, 2015. ,
Reducing the space requirement of suffix trees, vol.13, pp.1149-1171, 1999. ,
Martin SHUMWAY et al. Versatile and open software for comparing large genomes, Genome Biology, vol.5, 2004. ,
LRCstats, a tool for evaluating long reads correction methods, Bioinformatics, vol.33, pp.3652-3654, 2017. ,
Ultrafast and memoryefficient alignment of short DNA sequences to the human genome, Genome Biology, vol.10, 2009. ,
Fast gapped-read alignment with Bowtie 2, Nat Methods, vol.9, pp.357-359, 2012. ,
Generating consensus sequences from partial order multiple sequence alignment graphs, Bioinformatics, vol.19, pp.999-1008, 2003. ,
Multiple sequence alignment using partial order graphs, Bioinformatics, vol.18, pp.452-464, 2002. ,
Error correction and assembly complexity of single molecule sequencing reads, bioRxiv, p.6395, 2014. ,
Aligning sequence reads, clone sequences and assembly contigs with, 2013. ,
Minimap and miniasm : Fast mapping and de novo assembly for noisy long sequences, Bioinformatics, vol.32, pp.2103-2110, 2016. ,
Minimap2 : pairwise alignment for nucleotide sequences, Bioinformatics, vol.34, pp.3094-3100, 2018. ,
Fast and accurate long-read alignment with Burrows-Wheeler transform, Bioinformatics, vol.26, pp.589-595, 2010. ,
Fast and accurate short read alignment with Burrows-Wheeler transform, Bioinformatics, vol.25, pp.1754-1760, 2009. ,
De novo assembly of human genomes with massively parallel short read sequencing, Genome Research, vol.20, pp.265-272, 2010. ,
DeepSimulator : A deep simulator for Nanopore sequencing, Bioinformatics, vol.34, pp.2899-2908, 2018. ,
Comparative assessment of long-read error-correction software applied to RNA-sequencing data, 2019. ,
Manifold de Bruijn Graphs, Algorithms in Bioinformatics : 14th International Workshop, WABI 2014, pp.296-310, 2014. ,
Musket : A multistage kmer spectrum-based error corrector for Illumina sequence data, Bioinformatics, vol.29, pp.308-315, 2013. ,
BLAST+ : architecture and applications, BMC Bioinformatics, vol.10, p.421, 2009. ,
Genome assembly using Nanopore-guided long and error-free DNA reads, BMC Genomics, vol.16, p.327, 2015. ,
Commet : Comparing and combining multiple metagenomic datasets, IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2014. ,
Suffix Arrays : A New Method for On-line String Searches, Proceedings of the First Annual ACM-SIAM Symposium on Discrete Algorithms SODA '90, pp.319-327, 1990. ,
Jellyfish : A fast k-mer counter, pp.1-8, 2012. ,
QuorUM : An Error Corrector for Illumina Reads, PLoS ONE, vol.10, pp.1-13, 2015. ,
From reads to transcripts : de novo methods for the analysis of transcriptome second and third generation sequencing, 2018. ,
yacrd and fpa : upstream tools for long-read genome assembly, 2019. ,
A new method for sequencing DNA, Proceedings of The National Academy of Sciences of The United States Of America, vol.74, pp.99-103, 1977. ,
Jabba : hybrid error correction for long sequencing reads, Algorithms for Molecular Biology, vol.11, p.10, 2016. ,
Versatile genome assembly evaluation with QUAST-LG, Bioinformatics, vol.34, pp.142-150, 2018. ,
Fast and accurate read alignment for resequencing, Bioinformatics, vol.28, pp.2366-2373, 2012. ,
A whole-genome assembly of Drosophila, Science, vol.287, pp.2196-2204, 2000. ,
Efficient Local Alignment Discovery amongst Noisy Long Reads, Algorithms in Bioinformatics, pp.52-67, 2014. ,
A general method applicable to the search for similarities in the amino acid sequence of two proteins, Journal of Molecular Biology, vol.48, pp.90057-90061, 1970. ,
Comparative genome assembly, Briefings in Bioinformatics, vol.5, pp.237-248, 2004. ,
High-throughput sequencing technologies, Molecular cell, vol.58, pp.586-597, 2015. ,
MetaSim : A Sequencing Simulator for Genomics and Metagenomics. Handbook of Molecular Microbial Ecology I : Metagenomics and Complementary Approaches 3, vol.10, pp.417-421, 2011. ,
DSK : K-mer counting with very low memory usage, Bioinformatics, vol.29, pp.652-653, 2013. ,
, Question 48. L'Intermédiaire des Mathématiciens, vol.1, pp.107-110, 1894.
Correction of sequencing errors in a mixed set of reads, Bioinformatics, vol.26, 2010. ,
Accurate and efficient long read error correction, Bioinformatics, vol.30, pp.3506-3514, 2014. ,
URL : https://hal.archives-ouvertes.fr/lirmm-01100451
Correcting errors in short reads by multiple alignments, Bioinformatics, vol.27, pp.1455-1461, 2011. ,
Accurate selfcorrection of errors in long reads using de Bruijn graphs, Bioinformatics, vol.33, pp.799-806, 2017. ,
DNA sequencing with chainterminating inhibitors, Proceedings of The National Academy of Sciences of The United States Of America, vol.74, pp.5463-5467, 1977. ,
Longest Increasing and Decreasing Subsequences, Canadian Journal of Mathematics, vol.13, pp.179-191, 1961. ,
SHREC : A short-read error correction method, Bioinformatics, vol.25, pp.2157-2163, 2009. ,
Piercing the dark matter : bioinformatics of long-range sequencing and mapping, Nature Reviews Genetics, vol.19, pp.329-346, 2018. ,
Accurate detection of complex structural variations using single-molecule sequencing, Nature Methods, vol.15, pp.461-468, 2018. ,
DNA sequencing at 40 : Past, present and future, Nature, vol.550, pp.345-353, 2017. ,
Next-generation DNA sequencing, Nature Biotechnology, vol.26, pp.1135-1145, 2008. ,
Identification of common molecular subsequences, Journal of Molecular Biology, vol.147, issue.81, pp.90087-90092, 1981. ,
SimLoRD : Simulation of Long Read Data, Bioinformatics. T, vol.32, pp.2704-2706, 2016. ,
Non Hybrid Long Read Consensus Using Local De Bruijn Graph Assembly, 2017. ,
Fast and accurate de novo genome assembly from long uncorrected reads, Genome Research, vol.27, pp.737-746, 2017. ,
Error bounds for convolutional codes and an asymptotically optimum decoding algorithm, IEEE Transactions on Information Theory, vol.13, pp.260-269, 1967. ,
Genome sequencing in microfabricated highdensity picolitre reactors, Nature, vol.437, pp.376-380, 2005. ,
EssaMEM : Finding maximal exact matches using enhanced sparse suffix arrays, Bioinformatics, vol.29, pp.802-804, 2013. ,
Hybrid long read error correction using an FM-index, BMC Bioinformatics, vol.19, pp.1-11, 2018. ,
A Theorem on Boolean Matrices, Journal of the ACM, vol.9, pp.11-12, 1962. ,
Molecular Structure of Nucleic Acids : A Structure for Deoxyribose Nucleic Acid, Nature, vol.171, pp.737-738, 1953. ,
NPBSS : A new PacBio sequencing simulator for generating the continuous long reads with an empirical model, BMC Bioinformatics, vol.19, pp.1-9, 2018. ,
Linear pattern matching algorithms, Switching and Automata Theory, 1973. SWAT '08. IEEE Conference Record of 14th Annual Symposium on, pp.1-11, 1973. ,
,
,
,
The Sequence Alignment/Map format and SAMtools, Bioinformatics, vol.25, pp.2078-2079, 2009. ,
Fast mapping, error correction, and de novo assembly for singlemolecule sequencing reads, Nature Methods, vol.14, pp.1072-1074, 2017. ,
NanoSim : Nanopore sequence read simulator based on statistical characterization, 2017. ,
A survey of errorcorrection methods for next-generation sequencing, Briefings in Bioinformatics, vol.14, pp.56-66, 2013. ,
Sparc : a sparsity-based consensus algorithm for long erroneous sequencing reads, PeerJ, vol.4, 2016. ,