Test and Benchmarking of A New Scaffolding Methodology - Archive ouverte HAL Access content directly
Master Thesis Year : 2015

Test and Benchmarking of A New Scaffolding Methodology

(1)
1

Abstract

The GENSCALE scaffolding methodology basis is the computing and pre-processing of unitig coverage. Different modeling strategies are tested and evaluated to solve the scaffolding problem. An evaluation strategy was set up to understand why some data sets are especially challenging. The repeat content impacting on the unitig number rather than the size of the genome is the cause of complex input data. Some explanations are provided for problematic data sets (disconnected graph or missing link) however the main source of difficulties is the size of the modeled graph. A new two step scaffolding modeling strategy is in development. It tries to break the graph complexity by first solving a graph containing only large unitigs - building something that can be compared to a trustworthy genomic frame. The benchmarking workflow brings together several sequence comparing tools: a tool for assessing assembly quality (Quast), a sequence aligner (MUMmer) and homemade visualization and comparison scripts (graph generator.py and graph comparato.py). Although compared to a single published scaffolder (SSPACE), this methodology can be applied for any scaffolding solution obtained with other tools. Comparing our methodology to recent publications such as the ScaffMatch27 scaffolder or the Integer Linear Programming approach28 developed at the Montpellier Laboratory of Informatics, Robotics and Microelectronics (LIRMM) which explicitly discusses its repeated sequence processing will be insightful. Their article was the motivation behind the study on Wolbachia Endosymbiont, one of their tested organism. The benchmarking of our methodology highlights the major advantage of processing unitig coverage but also the limitation of the models which have difficulties with bigger graphs with high degree nodes. Overall the project succeeded more in the standardisation of evalution and benchmarking strategies than on providing precise explanations for unsuccessful scaffoldings.
Fichier principal
Vignette du fichier
Testing.pdf (1.57 Mo) Télécharger le fichier
Loading...

Dates and versions

hal-01251303 , version 1 (06-01-2016)

Identifiers

  • HAL Id : hal-01251303 , version 1

Cite

Alexandrina Bodrug. Test and Benchmarking of A New Scaffolding Methodology. Bioinformatics [q-bio.QM]. 2015. ⟨hal-01251303⟩
175 View
200 Download

Share

Gmail Facebook Twitter LinkedIn More