Performance analysis of methods to infer missing genotypes

Abstract : Complex analyses such as genetic mapping, disease association studies, disease mapping in the context of environmental health and environmental epidemiology studies rely on high-throughput genotyping techniques. These analyses thoroughly examine genetic variations between subjects, in particular through Single Nucleotide Polymorphism (SNP). Nonetheless, though nowadays genotyping techniques impose high-quality standards, one still has to cope with the issues of missing data and genotyping errors. Typically, the percentage of missing data - or missing calls - now ranges in interval [5%,10%]. Computational inference of missing data represents a challenging alternative to genotyping again the missing regions. This document first briefly reviews the various methods designed to infer missing SNPs. Then, it reports performances published for these inference methods. The present report carefully describes the characteristics of the different benchmarks generated by the designers (missing data percentage, correlation between SNPs). We show that most methods provide accuracies in the range [90%,96%]. However, we also emphasize that no algorithm garantees constant high accuracies: an algorithm may perform well on some benchmarks and show in contrast relatively poor results on others.
Liste complète des métadonnées

Littérature citée [18 références]  Voir  Masquer  Télécharger
Contributeur : Christine Sinoquet <>
Soumis le : dimanche 5 octobre 2008 - 17:27:40
Dernière modification le : jeudi 5 avril 2018 - 10:36:49
Document(s) archivé(s) le : mardi 21 septembre 2010 - 17:48:20


Fichiers produits par l'(les) auteur(s)


  • HAL Id : inria-00326741, version 2



Christine Sinoquet. Performance analysis of methods to infer missing genotypes. [Research Report] 2008. 〈inria-00326741v2〉



Consultations de la notice


Téléchargements de fichiers