# VCF$\_$creator: Mapping and VCF Creation features in DiscoSnp++

1 GenScale - Scalable, Optimized and Parallel Algorithms for Genomics
Inria Rennes – Bretagne Atlantique , IRISA-D7 - GESTION DES DONNÉES ET DE LA CONNAISSANCE
Abstract : The software DiscoSnp++ is designed to detect genomic variants such as Single Nucleotide Polymorphism (SNPs) and insertion/deletion (INDELs) from raw read set(s) without any reference genome. This de novo method, enables to find variations : among or between individuals, in particular for non-model organism, for which there is often no reference genome available or poor quality one. These markers, because of their number and their distribution on the genome are used in many biological areas : agronomy, health, medicine, or environnement. To facilitate downstream analyses and selection of these variants, we propose VCF_creator, a new feature of DiscoSnp++. Starting from the DiscoSnp++ predictions and a reference genome, VCF_creator performs an alignment of the predictions on the reference. It outputs the variants in the VCF format (Variant Calling Format), which is the text file format commonly used to report variants. The pipeline is as follows, the first step consists in aligning all the predictions on the reference using a mapping tool (BWA). VCF_creator analyses the information obtained as results of the mapping, in order to extract the mapping position and to distinguish unique from multiple ones. The validation algorithm works in the following manner, a prediction is validated as unique if there exists a distance, in terms of number of substitutions, for which both alleles of a variant have a unique mapping position on the reference genome. For each variant, the VCF output file gives the genomic position and the name of the sequence where it is aligned, reference and alternative allele, and the DiscoSnp++ information (coverage for each dataset, genotyping, rank ...). VCF_creator was applied on simulated data on the human chromosome 1. Results show that the majority of false positives predicted by DiscoSnp++ corresponds to non aligned or multiply aligned predictions. Therefore, this tool not only makes downstream analyses easier but also improves the precision of DiscoSnp++ predictions.
Document type :
Poster communications

https://hal.inria.fr/hal-01176492
Contributor : Pierre Peterlongo Connect in order to contact the contributor
Submitted on : Wednesday, July 15, 2015 - 2:53:40 PM
Last modification on : Monday, November 22, 2021 - 1:52:23 PM
Long-term archiving on: : Friday, October 16, 2015 - 11:05:14 AM

### File

posterJOBIM29062015.pdf
Files produced by the author(s)

### Identifiers

• HAL Id : hal-01176492, version 1

### Citation

Chloé Riou, Claire Lemaitre, Pierre Peterlongo. VCF$\_$creator: Mapping and VCF Creation features in DiscoSnp++. JOBIM 2015, Jul 2015, Clermont-Ferrand, France. ⟨hal-01176492⟩

### Metrics

Les métriques sont temporairement indisponibles