Cross-validation failure: small sample sizes lead to large error bars

Gaël Varoquaux

doi:10.1016/j.neuroimage.2017.06.061

Article Dans Une Revue NeuroImage Année : 2017

Cross-validation failure: small sample sizes lead to large error bars

(1)

Gaël Varoquaux

Fonction : Auteur
PersonId : 5878
IdHAL : gael-varoquaux
ORCID : 0000-0003-1076-5122
IdRef : 126239894

Modelling brain structure, function and variability based on high-field MRI data

Résumé

Predictive models ground many state-of-the-art developments in statistical brain image analysis: decoding, MVPA, searchlight, or extraction of biomarkers. The principled approach to establish their validity and usefulness is cross-validation, testing prediction on unseen data. Here, I would like to raise awareness on error bars of cross-validation, which are often underestimated. Simple experiments show that sample sizes of many neuroimaging studies inherently lead to large error bars, eg ±10% for 100 samples. The standard error across folds strongly underestimates them. These large error bars compromise the reliability of conclusions drawn with predictive models, such as biomarkers or methods developments where, unlike with cognitive neuroimaging MVPA approaches, more samples cannot be acquired by repeating the experiment across many subjects. Solutions to increase sample size must be investigated, tackling possible increases in heterogeneity of the data.

Mots clés

cross-validation statistics decoding fMRI model selection MVPA biomarkers Comments and Controversies

Domaines

Bio-informatique [q-bio.QM] Neurosciences Psychologie Machine Learning [stat.ML] Méthodologie [stat.ME]

Fichier principal

paper.pdf (789.04 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Gaël Varoquaux : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01545002

Soumis le : jeudi 22 juin 2017-13:28:59

Dernière modification le : mercredi 3 avril 2024-10:20:13

Archivage à long terme le : mercredi 10 janvier 2018-14:43:37

Dates et versions

hal-01545002 , version 1 (22-06-2017)

Identifiants

HAL Id : hal-01545002 , version 1
ARXIV : 1706.07581
DOI : 10.1016/j.neuroimage.2017.06.061

Citer

Gaël Varoquaux. Cross-validation failure: small sample sizes lead to large error bars. NeuroImage, 2017, ⟨10.1016/j.neuroimage.2017.06.061⟩. ⟨hal-01545002⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CEA INRIA INRIA2 CEA-UPSAY UNIV-PARIS-SACLAY JOLIOT CEA-DRF NEUROSPIN ANR GS-ENGINEERING GS-COMPUTER-SCIENCE

4592 Consultations

3383 Téléchargements

Cross-validation failure: small sample sizes lead to large error bars

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager