Learning from Positive and Unlabeled Examples

François Denis; Rémi Gilleron; Fabien Letouzey

doi:10.1016/j.tcs.2005.09.007

Article Dans Une Revue Theoretical Computer Science Année : 2005

Learning from Positive and Unlabeled Examples

(1) , (2, 3) , (3)

1
2
3

François Denis

Fonction : Auteur
PersonId : 844150

Laboratoire d'informatique Fondamentale de Marseille - UMR 6166

Rémi Gilleron

Fonction : Auteur
PersonId : 184332
IdHAL : remi-gilleron
ORCID : 0000-0002-1583-5938
IdRef : 061168718

Modeling Tree Structures, Machine Learning, and Information Extraction

Groupe de Recherche en Apprentissage Automatique

Fabien Letouzey

Fonction : Auteur

Groupe de Recherche en Apprentissage Automatique

Résumé

In many machine learning settings, labeled examples are difficult to collect while unlabeled data are abundant. Also, for some binary classification problems, positive examples, that is examples of the target class, are available. Can these additional data be used to improve accuracy of supervised learning algorithms? We investigate in this paper the design of learning algorithms from positive and unlabeled data only. Many machine learning and data mining algorithms, such as decision tree induction algorithms and naive Bayes algorithms, only use examples in order to evaluate statistical queries (SQ-like algorithms). Kearns designed the Statistical Query learning model in order to describe these algorithms. Here, we design an algorithm scheme which transforms any SQ-like algorithm into an algorithm based on positive statistical queries (estimates for probabilities over the set of positive instances) and instance statistical queries (estimates for probabilities over the instance space). We prove that any class learnable in the Statistical Query learning model is learnable from positive statistical queries and instance statistical queries only if a lower bound on the weight of any target concept $f$ can be estimated in polynomial time. Then, we design a decision tree induction algorithm POSC4.5, based on C4.5, that uses only positive and unlabeled examples and we give experimental results for this algorithm. The case of imbalanced classes in the sense that one of the two classes (say the positive class) is heavily underrepresented compared to the other class remains open. This problem is challenging because it is encountered in many real-world applications.

Mots clés

PAC learning Statistical Query model Semi-supervised Learning Data Mining

Domaines

Langage de programmation [cs.PL]

Fichier principal

onlypos02.pdf (305.35 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Rémi Gilleron : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00536692

Soumis le : mardi 16 novembre 2010-17:31:42

Dernière modification le : vendredi 24 mars 2023-14:52:53

Archivage à long terme le : jeudi 17 février 2011-03:06:56

Dates et versions

inria-00536692 , version 1 (16-11-2010)

Identifiants

HAL Id : inria-00536692 , version 1
DOI : 10.1016/j.tcs.2005.09.007

Citer

François Denis, Rémi Gilleron, Fabien Letouzey. Learning from Positive and Unlabeled Examples. Theoretical Computer Science, 2005, 348 (1), pp.70-83. ⟨10.1016/j.tcs.2005.09.007⟩. ⟨inria-00536692⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-LILLE3 LIF CNRS INRIA UNIV-AMU LIFL MOSTRARE INRIA2 LIS-LAB

265 Consultations

419 Téléchargements

Learning from Positive and Unlabeled Examples

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager