Fast clustering for scalable statistical analysis on structured images

Abstract : The use of brain images as markers for diseases or behavioral differences is challenged by the small effects size and the ensuing lack of power, an issue that has incited researchers to rely more systematically on large cohorts. Coupled with resolution increases, this leads to very large datasets. A striking example in the case of brain imaging is that of the Human Connectome Project: 20 Terabytes of data and growing. The resulting data deluge poses severe challenges regarding the tractability of some processing steps (discriminant analysis, multivariate models) due to the memory demands posed by these data. In this work, we revisit dimension reduction approaches, such as random projections , with the aim of replacing costly function evaluations by cheaper ones while decreasing the memory requirements. Specifically, we investigate the use of alternate schemes, based on fast clustering, that are well suited for signals exhibiting a strong spatial structure, such as anatomical and functional brain images. Our contribution is twofold: i) we propose a linear-time clustering scheme that bypasses the percolation issues inherent in these algorithms and thus provides compressions nearly as good as traditional quadratic-complexity variance-minimizing clustering schemes; ii) we show that cluster-based compression can have the virtuous effect of removing high-frequency noise, actually improving subsequent estimations steps. As a consequence , the proposed approach yields very accurate models on several large-scale problems yet with impressive gains in computational efficiency , making it possible to analyze large datasets.
Type de document :
Communication dans un congrès
ICML Workshop on Statistics, Machine Learning and Neuroscience (Stamlins 2015), Jul 2015, Lille, France
Liste complète des métadonnées

Littérature citée [19 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01229023
Contributeur : Bertrand Thirion <>
Soumis le : lundi 11 avril 2016 - 09:31:53
Dernière modification le : mardi 3 juillet 2018 - 11:26:13

Fichier

paper.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01229023, version 2
  • ARXIV : 1511.04898

Citation

Bertrand Thirion, Andrés Hoyos-Idrobo, Jonas Kahn, Gaël Varoquaux. Fast clustering for scalable statistical analysis on structured images. ICML Workshop on Statistics, Machine Learning and Neuroscience (Stamlins 2015), Jul 2015, Lille, France. 〈hal-01229023v2〉

Partager

Métriques

Consultations de la notice

442

Téléchargements de fichiers

55