VCF$\_$creator: Mapping and VCF Creation features in DiscoSnp++ - Archive ouverte HAL Access content directly
Poster Communications Year :

## VCF$\_$creator: Mapping and VCF Creation features in DiscoSnp++

(1) , (1) , (1)
1
Chloé Riou
• Function : Author
Claire Lemaitre
Pierre Peterlongo

#### Abstract

The software DiscoSnp++ is designed to detect genomic variants such as Single Nucleotide Polymorphism (SNPs) and insertion/deletion (INDELs) from raw read set(s) without any reference genome. This de novo method, enables to find variations : among or between individuals, in particular for non-model organism, for which there is often no reference genome available or poor quality one. These markers, because of their number and their distribution on the genome are used in many biological areas : agronomy, health, medicine, or environnement. To facilitate downstream analyses and selection of these variants, we propose VCF_creator, a new feature of DiscoSnp++. Starting from the DiscoSnp++ predictions and a reference genome, VCF_creator performs an alignment of the predictions on the reference. It outputs the variants in the VCF format (Variant Calling Format), which is the text file format commonly used to report variants. The pipeline is as follows, the first step consists in aligning all the predictions on the reference using a mapping tool (BWA). VCF_creator analyses the information obtained as results of the mapping, in order to extract the mapping position and to distinguish unique from multiple ones. The validation algorithm works in the following manner, a prediction is validated as unique if there exists a distance, in terms of number of substitutions, for which both alleles of a variant have a unique mapping position on the reference genome. For each variant, the VCF output file gives the genomic position and the name of the sequence where it is aligned, reference and alternative allele, and the DiscoSnp++ information (coverage for each dataset, genotyping, rank ...). VCF_creator was applied on simulated data on the human chromosome 1. Results show that the majority of false positives predicted by DiscoSnp++ corresponds to non aligned or multiply aligned predictions. Therefore, this tool not only makes downstream analyses easier but also improves the precision of DiscoSnp++ predictions.

#### Domains

Computer Science [cs] Bioinformatics [q-bio.QM]

### Dates and versions

hal-01176492 , version 1 (15-07-2015)

### Identifiers

• HAL Id : hal-01176492 , version 1

### Cite

Chloé Riou, Claire Lemaitre, Pierre Peterlongo. VCF$\_$creator: Mapping and VCF Creation features in DiscoSnp++. JOBIM 2015, Jul 2015, Clermont-Ferrand, France. ⟨hal-01176492⟩

### Export

BibTeX TEI Dublin Core DC Terms EndNote Datacite

215 View