How to remove or control confounds in predictive models, with applications to brain biomarkers

Darya Chyzhyk; Gaël Varoquaux; Michael Milham; Bertrand Thirion

doi:10.1093/gigascience/giac014

Article Dans Une Revue GigaScience Année : 2022

How to remove or control confounds in predictive models, with applications to brain biomarkers

(1) , (2) , (3, 4) , (5)

1
2
3
4
5

Darya Chyzhyk

Fonction : Auteur
PersonId : 1130222

Modelling brain structure, function and variability based on high-field MRI data

Gaël Varoquaux

Fonction : Auteur
PersonId : 5878
IdHAL : gael-varoquaux
ORCID : 0000-0003-1076-5122
IdRef : 126239894

Méthodes computationnelles et mathématiques pour comprendre la société et la santé à partir de données

Michael Milham

Fonction : Auteur

Child Mind Institute

Nathan S. Kline Institute for Psychiatric Research

Bertrand Thirion

Fonction : Auteur
PersonId : 760493
ORCID : 0000-0001-5018-7895
IdRef : 080779565

Modèles et inférence pour les données de Neuroimagerie

Résumé

Background : With increasing data sizes and more easily available computational methods, neurosciences rely more and more on predictive modeling with machine learning, e.g., to extract disease biomarkers. Yet, a successful prediction may capture a confounding effect correlated with the outcome instead of brain features specific to the outcome of interest. For instance, because patients tend to move more in the scanner than controls, imaging biomarkers of a disease condition may mostly reflect head motion, leading to inefficient use of resources and wrong interpretation of the biomarkers. Results : Here we study how to adapt statistical methods that control for confounds to predictive modeling settings. We review how to train predictors that are not driven by such spurious effects. We also show how to measure the unbiased predictive accuracy of these biomarkers, based on a confounded dataset. For this purpose, cross-validation must be modified to account for the nuisance effect. To guide understanding and practical recommendations, we apply various strategies to assess predictive models in the presence of confounds on simulated data and population brain imaging settings. Theoretical and empirical studies show that deconfounding should not be applied to the train and test data jointly: modeling the effect of confounds, on the training data only, should instead be decoupled from removing confounds. Conclusions : Cross-validation that isolates nuisance effects gives an additional piece of information: confound-free prediction accuracy.

Mots clés

confound subsampling phenotype predictive models biomarkers statistical testing deconfounding

Domaines

Imagerie médicale

Fichier principal

giac014.pdf (3.03 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Bertrand Thirion : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-03607651

Soumis le : lundi 14 mars 2022-10:20:45

Dernière modification le : mercredi 3 avril 2024-10:20:13

Archivage à long terme le : mercredi 15 juin 2022-18:22:47

Dates et versions

hal-03607651 , version 1 (14-03-2022)

Identifiants

HAL Id : hal-03607651 , version 1
DOI : 10.1093/gigascience/giac014

Citer

Darya Chyzhyk, Gaël Varoquaux, Michael Milham, Bertrand Thirion. How to remove or control confounds in predictive models, with applications to brain biomarkers. GigaScience, 2022, 11, ⟨10.1093/gigascience/giac014⟩. ⟨hal-03607651⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CEA INRIA INRIA2 CEA-UPSAY UNIV-PARIS-SACLAY JOLIOT CEA-DRF NEUROSPIN ANR GS-ENGINEERING GS-COMPUTER-SCIENCE GS-LIFE-SCIENCES-HEALTH

157 Consultations

389 Téléchargements

How to remove or control confounds in predictive models, with applications to brain biomarkers

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager