Skip to Main content Skip to Navigation
Master thesis

Test and Benchmarking of A New Scaffolding Methodology

Alexandrina Bodrug 1
1 GenScale - Scalable, Optimized and Parallel Algorithms for Genomics
Abstract : The GENSCALE scaffolding methodology basis is the computing and pre-processing of unitig coverage. Different modeling strategies are tested and evaluated to solve the scaffolding problem. An evaluation strategy was set up to understand why some data sets are especially challenging. The repeat content impacting on the unitig number rather than the size of the genome is the cause of complex input data. Some explanations are provided for problematic data sets (disconnected graph or missing link) however the main source of difficulties is the size of the modeled graph. A new two step scaffolding modeling strategy is in development. It tries to break the graph complexity by first solving a graph containing only large unitigs - building something that can be compared to a trustworthy genomic frame. The benchmarking workflow brings together several sequence comparing tools: a tool for assessing assembly quality (Quast), a sequence aligner (MUMmer) and homemade visualization and comparison scripts (graph and graph Although compared to a single published scaffolder (SSPACE), this methodology can be applied for any scaffolding solution obtained with other tools. Comparing our methodology to recent publications such as the ScaffMatch27 scaffolder or the Integer Linear Programming approach28 developed at the Montpellier Laboratory of Informatics, Robotics and Microelectronics (LIRMM) which explicitly discusses its repeated sequence processing will be insightful. Their article was the motivation behind the study on Wolbachia Endosymbiont, one of their tested organism. The benchmarking of our methodology highlights the major advantage of processing unitig coverage but also the limitation of the models which have difficulties with bigger graphs with high degree nodes. Overall the project succeeded more in the standardisation of evalution and benchmarking strategies than on providing precise explanations for unsuccessful scaffoldings.
Document type :
Master thesis
Complete list of metadata

Cited literature [31 references]  Display  Hide  Download
Contributor : Rumen Andonov Connect in order to contact the contributor
Submitted on : Wednesday, January 6, 2016 - 2:35:32 PM
Last modification on : Tuesday, October 19, 2021 - 11:58:56 PM


  • HAL Id : hal-01251303, version 1


Alexandrina Bodrug. Test and Benchmarking of A New Scaffolding Methodology. Bioinformatics [q-bio.QM]. 2015. ⟨hal-01251303⟩



Record views


Files downloads