Are labels informative in semi-supervised learning? Estimating and leveraging the missing-data mechanism - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2023

Are labels informative in semi-supervised learning? Estimating and leveraging the missing-data mechanism

Résumé

Semi-supervised learning is a powerful technique for leveraging unlabeled data to improve machine learning models, but it can be affected by the presence of “informative” labels, which occur when some classes are more likely to be labeled than others. In the missing data literature, such labels are called missing not at random. In this paper, we propose a novel approach to address this issue by estimating the missing-data mechanism and using inverse propensity weighting to debias any SSL algorithm, including those using data augmentation. We also propose a likelihood ratio test to assess whether or not labels are indeed informative. Finally, we demonstrate the performance of the proposed methods on different datasets, in particular on two medical datasets for which we design pseudo-realistic missing data scenarios.

Domaines

Autres [stat.ML]
Fichier principal
Vignette du fichier
main.pdf (459.23 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03982898 , version 1 (14-02-2023)

Licence

Paternité

Identifiants

Citer

Aude Sportisse, Hugo Schmutz, Olivier Humbert, Charles Bouveyron, Pierre-Alexandre Mattei. Are labels informative in semi-supervised learning? Estimating and leveraging the missing-data mechanism. International Conference on Machine Learning (ICML), 2023, Hawaii, United States. ⟨hal-03982898⟩
50 Consultations
68 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More