Revisiting classifier two-sample tests

David Lopez-Paz; Maxime Oquab

Communication Dans Un Congrès Année : 2017

Revisiting classifier two-sample tests

(1) , (2)

1
2

David Lopez-Paz

Fonction : Auteur

Facebook AI Research [Paris]

Maxime Oquab

Fonction : Auteur
PersonId : 949102

Laboratoire d'informatique de l'école normale supérieure

Résumé

The goal of two-sample tests is to assess whether two samples, $S P ∼ P n$ and $S Q ∼ Q m$ , are drawn from the same distribution. Perhaps intriguingly, one relatively unexplored method to build two-sample tests is the use of binary classifiers. In particular, construct a dataset by pairing the n examples in S P with a positive label, and by pairing the m examples in $S Q$ with a negative label. If the null hypothesis " $P = Q$ " is true, then the classification accuracy of a binary classifier on a held-out subset of this dataset should remain near chance-level. As we will show, such Classifier Two-Sample Tests (C2ST) learn a suitable representation of the data on the fly, return test statistics in interpretable units, have a simple null distribution, and their predictive uncertainty allow to interpret where P and Q differ. The goal of this paper is to establish the properties, performance, and uses of C2ST. First, we analyze their main theoretical properties. Second, we compare their performance against a variety of state-of-the-art alternatives. Third, we propose their use to evaluate the sample quality of generative models with intractable likelihoods, such as Generative Adversarial Networks (GANs). Fourth, we showcase the novel application of GANs together with C2ST for causal discovery.

Domaines

Machine Learning [stat.ML]

Fichier principal

classifier_tests.pdf (5.76 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Maxime Oquab : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01862834

Soumis le : lundi 27 août 2018-18:47:05

Dernière modification le : vendredi 19 avril 2024-16:18:55

Archivage à long terme le : mercredi 28 novembre 2018-16:37:21

Dates et versions

hal-01862834 , version 1 (27-08-2018)

Identifiants

HAL Id : hal-01862834 , version 1

Citer

David Lopez-Paz, Maxime Oquab. Revisiting classifier two-sample tests. International Conference on Learning Representations, Apr 2017, Toulon, France. ⟨hal-01862834⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-PARIS CNRS INRIA PSL

134 Consultations

148 Téléchargements

Revisiting classifier two-sample tests

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager