Test and Benchmarking of A New Scaffolding Methodology

Alexandrina Bodrug

Mémoires D'étudiants -- Hal-Inria+ Année : 2015

Test and Benchmarking of A New Scaffolding Methodology

(1)

Alexandrina Bodrug

Fonction : Auteur

Scalable, Optimized and Parallel Algorithms for Genomics

Résumé

The GENSCALE scaffolding methodology basis is the computing and pre-processing of unitig coverage. Different modeling strategies are tested and evaluated to solve the scaffolding problem. An evaluation strategy was set up to understand why some data sets are especially challenging. The repeat content impacting on the unitig number rather than the size of the genome is the cause of complex input data. Some explanations are provided for problematic data sets (disconnected graph or missing link) however the main source of difficulties is the size of the modeled graph. A new two step scaffolding modeling strategy is in development. It tries to break the graph complexity by first solving a graph containing only large unitigs - building something that can be compared to a trustworthy genomic frame. The benchmarking workflow brings together several sequence comparing tools: a tool for assessing assembly quality (Quast), a sequence aligner (MUMmer) and homemade visualization and comparison scripts (graph generator.py and graph comparato.py). Although compared to a single published scaffolder (SSPACE), this methodology can be applied for any scaffolding solution obtained with other tools. Comparing our methodology to recent publications such as the ScaffMatch27 scaffolder or the Integer Linear Programming approach28 developed at the Montpellier Laboratory of Informatics, Robotics and Microelectronics (LIRMM) which explicitly discusses its repeated sequence processing will be insightful. Their article was the motivation behind the study on Wolbachia Endosymbiont, one of their tested organism. The benchmarking of our methodology highlights the major advantage of processing unitig coverage but also the limitation of the models which have difficulties with bigger graphs with high degree nodes. Overall the project succeeded more in the standardisation of evalution and benchmarking strategies than on providing precise explanations for unsuccessful scaffoldings.

Domaines

Bio-informatique [q-bio.QM]

Fichier principal

Testing.pdf (1.57 Mo)

Rumen Andonov : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01251303

Soumis le : mercredi 6 janvier 2016-14:35:32

Dernière modification le : vendredi 24 mars 2023-14:53:01

Dates et versions

hal-01251303 , version 1 (06-01-2016)

Identifiants

HAL Id : hal-01251303 , version 1

Citer

Alexandrina Bodrug. Test and Benchmarking of A New Scaffolding Methodology. Bioinformatics [q-bio.QM]. 2015. ⟨hal-01251303⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM CNRS INRIA INSA-RENNES IRISA CENTRALESUPELEC IRISA-D7 INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

184 Consultations

216 Téléchargements

Test and Benchmarking of A New Scaffolding Methodology

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager