Imaging genetics: bio-informatics and bio-statistics challenges

Abstract : The IMAGEN study -- a very large European Research Project -- seeks to identify and characterize biological and environmental factors that in uence teenagers mental health. To this aim, the consortium plans to collect data for more than 2000 subjects at 8 neuroimaging centres. These data comprise neuroimaging data, behavioral tests (for up to 5 hours of testing), and also white blood samples which are collected and processed to obtain 650k single nucleotide polymorphisms (SNP) per subject. Data for more than 1000 subjects have already been collected. We describe the statistical aspects of these data and the challenges, such as the multiple comparison problem, created by such a large imaging genetics study (i.e., 650k for the SNP, 50k data per neuroimage).We also suggest possible strategies, and present some rst investigations using uni or multi-variate methods in association with re-sampling techniques. Specically, because the number of variables is very high, we rst reduce the data size and then use multivariate (CCA, PLS) techniques in association with re-sampling techniques.
