Skip to Main content Skip to Navigation

Performance analysis of methods to infer missing genotypes

Abstract : Complex analyses such as genetic mapping, disease association studies, disease mapping in the context of environmental health and environmental epidemiology studies rely on high-throughput genotyping techniques. These analyses thoroughly examine genetic variations between subjects, in particular through Single Nucleotide Polymorphism (SNP). Nonetheless, though nowadays genotyping techniques impose high-quality standards, one still has to cope with the issues of missing data and genotyping errors. Typically, the percentage of missing data - or missing calls - now ranges in interval [5%,10%]. Computational inference of missing data represents a challenging alternative to genotyping again the missing regions. This document first briefly reviews the various methods designed to infer missing SNPs. Then, it reports performances published for these inference methods. The present report carefully describes the characteristics of the different benchmarks generated by the designers (missing data percentage, correlation between SNPs). We show that most methods provide accuracies in the range [90%,96%]. However, we also emphasize that no algorithm garantees constant high accuracies: an algorithm may perform well on some benchmarks and show in contrast relatively poor results on others.
Complete list of metadata

Cited literature [18 references]  Display  Hide  Download
Contributor : Christine Sinoquet Connect in order to contact the contributor
Submitted on : Sunday, October 5, 2008 - 5:27:40 PM
Last modification on : Wednesday, April 27, 2022 - 3:47:28 AM
Long-term archiving on: : Tuesday, September 21, 2010 - 5:48:20 PM


Files produced by the author(s)


  • HAL Id : inria-00326741, version 2



Christine Sinoquet. Performance analysis of methods to infer missing genotypes. [Research Report] 2008. ⟨inria-00326741v2⟩



Record views


Files downloads