VCF$\_$creator: Mapping and VCF Creation features in DiscoSnp++

Chloé Riou 1 Claire Lemaitre 1 Pierre Peterlongo 1
1 GenScale - Scalable, Optimized and Parallel Algorithms for Genomics
IRISA-D7 - GESTION DES DONNÉES ET DE LA CONNAISSANCE, Inria Rennes – Bretagne Atlantique
Abstract : The software DiscoSnp++ is designed to detect genomic variants such as Single Nucleotide Polymorphism (SNPs) and insertion/deletion (INDELs) from raw read set(s) without any reference genome. This de novo method, enables to find variations : among or between individuals, in particular for non-model organism, for which there is often no reference genome available or poor quality one. These markers, because of their number and their distribution on the genome are used in many biological areas : agronomy, health, medicine, or environnement. To facilitate downstream analyses and selection of these variants, we propose VCF_creator, a new feature of DiscoSnp++. Starting from the DiscoSnp++ predictions and a reference genome, VCF_creator performs an alignment of the predictions on the reference. It outputs the variants in the VCF format (Variant Calling Format), which is the text file format commonly used to report variants. The pipeline is as follows, the first step consists in aligning all the predictions on the reference using a mapping tool (BWA). VCF_creator analyses the information obtained as results of the mapping, in order to extract the mapping position and to distinguish unique from multiple ones. The validation algorithm works in the following manner, a prediction is validated as unique if there exists a distance, in terms of number of substitutions, for which both alleles of a variant have a unique mapping position on the reference genome. For each variant, the VCF output file gives the genomic position and the name of the sequence where it is aligned, reference and alternative allele, and the DiscoSnp++ information (coverage for each dataset, genotyping, rank ...). VCF_creator was applied on simulated data on the human chromosome 1. Results show that the majority of false positives predicted by DiscoSnp++ corresponds to non aligned or multiply aligned predictions. Therefore, this tool not only makes downstream analyses easier but also improves the precision of DiscoSnp++ predictions.
Type de document :
Poster
JOBIM 2015, Jul 2015, Clermont-Ferrand, France
Liste complète des métadonnées

https://hal.inria.fr/hal-01176492
Contributeur : Pierre Peterlongo <>
Soumis le : mercredi 15 juillet 2015 - 14:53:40
Dernière modification le : mardi 16 janvier 2018 - 15:54:20
Document(s) archivé(s) le : vendredi 16 octobre 2015 - 11:05:14

Fichier

posterJOBIM29062015.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01176492, version 1

Citation

Chloé Riou, Claire Lemaitre, Pierre Peterlongo. VCF$\_$creator: Mapping and VCF Creation features in DiscoSnp++. JOBIM 2015, Jul 2015, Clermont-Ferrand, France. 〈hal-01176492〉

Partager

Métriques

Consultations de la notice

389

Téléchargements de fichiers

196