Radioactive Data: Tracing Through Training - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2020

Radioactive Data: Tracing Through Training

Résumé

Data tracing determines whether particular data samples have been used to train a model. We propose a new technique, radioactive data, that makes imperceptible changes to these samples such that any model trained on them will bear an identifiable mark. Given a trained model, our technique detects the use of radioactive data and provides a level of confidence (p-value). Experiments on large-scale benchmarks (Imagenet), with standard architectures (Resnet-18, VGG-16, Densenet-121) and training procedures, show that we detect radioactive data with high confidence (p <0.0001) when only 1% of the data used to train a model is radioactive. Our radioactive mark is resilient to strong data augmentations and variations of the model architecture. As a result, it offers a much higher signal-to-noise ratio than data poisoning and backdoor methods.
Fichier principal
Vignette du fichier
Radioactive_data.pdf (4.05 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-02954159 , version 1 (30-09-2020)

Identifiants

  • HAL Id : hal-02954159 , version 1

Citer

Alexandre Sablayrolles, Matthijs Douze, Cordelia Schmid, Hervé Jégou. Radioactive Data: Tracing Through Training. ICML 2020 - Thirty-seventh International Conference on Machine Learning, Jul 2020, Vienna / Virtual, Austria. pp.8326-8335. ⟨hal-02954159⟩
175 Consultations
237 Téléchargements

Partager

Gmail Facebook X LinkedIn More