Post-clustering difference testing: valid inference and practical considerations

Clustering is part of unsupervised analysis methods that consist in grouping samples into homogeneous and separate subgroups of observations also called clusters. To interpret the clusters, statistical hypothesis testing is often used to infer the variables that significantly separate the estimated clusters from each other. However, data-driven hypotheses are considered for the inference process, since the hypotheses are derived from the clustering results. This double use of the data leads traditional hypothesis test to fail to control the Type I error rate particularly because of uncertainty in the clustering process and the potential artificial differences it could create. We propose three novel statistical hypothesis tests which account for the clustering process. Our tests efficiently control the Type I error rate by identifying only variables that contain a true signal separating groups of observations.

Mots clés

Clustering hypothesis testing double-dipping circular analysis selective inference multimodality test Dip Test

Domaines

Méthodologie [stat.ME]

Fichier principal

2210.13172.pdf (3.44 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Benjamin Hivert : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03889565

Soumis le : jeudi 8 décembre 2022-10:02:18

Dernière modification le : vendredi 15 mars 2024-03:19:25

Dates et versions

hal-03889565 , version 1 (08-12-2022)

Identifiants

HAL Id : hal-03889565 , version 1
ARXIV : 2210.13172
DOI : 10.1016/j.csda.2023.107916

Citer

Benjamin Hivert, Denis Agniel, Rodolphe Thiébaut, Boris P. Hejblum. Post-clustering difference testing: valid inference and practical considerations. Computational Statistics and Data Analysis, 2024, 193, pp.107916. ⟨10.1016/j.csda.2023.107916⟩. ⟨hal-03889565⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSERM INRIA UPEC INRIA-SILICONVALLEY INRIA2 ANR U1219

51 Consultations

73 Téléchargements