Test and Benchmarking of A New Scaffolding Methodology

Alexandrina Bodrug 1
1 GenScale - Scalable, Optimized and Parallel Algorithms for Genomics
Inria Rennes – Bretagne Atlantique , IRISA-D7 - GESTION DES DONNÉES ET DE LA CONNAISSANCE
Abstract : The GENSCALE scaffolding methodology basis is the computing and pre-processing of unitig coverage. Different modeling strategies are tested and evaluated to solve the scaffolding problem. An evaluation strategy was set up to understand why some data sets are especially challenging. The repeat content impacting on the unitig number rather than the size of the genome is the cause of complex input data. Some explanations are provided for problematic data sets (disconnected graph or missing link) however the main source of difficulties is the size of the modeled graph. A new two step scaffolding modeling strategy is in development. It tries to break the graph complexity by first solving a graph containing only large unitigs - building something that can be compared to a trustworthy genomic frame. The benchmarking workflow brings together several sequence comparing tools: a tool for assessing assembly quality (Quast), a sequence aligner (MUMmer) and homemade visualization and comparison scripts (graph generator.py and graph comparato.py). Although compared to a single published scaffolder (SSPACE), this methodology can be applied for any scaffolding solution obtained with other tools. Comparing our methodology to recent publications such as the ScaffMatch27 scaffolder or the Integer Linear Programming approach28 developed at the Montpellier Laboratory of Informatics, Robotics and Microelectronics (LIRMM) which explicitly discusses its repeated sequence processing will be insightful. Their article was the motivation behind the study on Wolbachia Endosymbiont, one of their tested organism. The benchmarking of our methodology highlights the major advantage of processing unitig coverage but also the limitation of the models which have difficulties with bigger graphs with high degree nodes. Overall the project succeeded more in the standardisation of evalution and benchmarking strategies than on providing precise explanations for unsuccessful scaffoldings.
Type de document :
Mémoires d'étudiants -- Hal-inria+
Bioinformatics [q-bio.QM]. 2015
Liste complète des métadonnées

Littérature citée [31 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01251303
Contributeur : Rumen Andonov <>
Soumis le : mercredi 6 janvier 2016 - 14:35:32
Dernière modification le : mercredi 16 mai 2018 - 11:23:35

Fichier

Identifiants

  • HAL Id : hal-01251303, version 1

Citation

Alexandrina Bodrug. Test and Benchmarking of A New Scaffolding Methodology. Bioinformatics [q-bio.QM]. 2015. 〈hal-01251303〉

Partager

Métriques

Consultations de la notice

246

Téléchargements de fichiers

149