Skip to Main content Skip to Navigation
Conference papers

Radioactive Data: Tracing Through Training

Alexandre Sablayrolles 1 Matthijs Douze 1 Cordelia Schmid 2, 3 Hervé Jégou 1
2 Thoth - Apprentissage de modèles à partir de données massives
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann
3 WILLOW - Models of visual object recognition and scene understanding
Inria de Paris, DI-ENS - Département d'informatique de l'École normale supérieure
Abstract : Data tracing determines whether particular data samples have been used to train a model. We propose a new technique, radioactive data, that makes imperceptible changes to these samples such that any model trained on them will bear an identifiable mark. Given a trained model, our technique detects the use of radioactive data and provides a level of confidence (p-value). Experiments on large-scale benchmarks (Imagenet), with standard architectures (Resnet-18, VGG-16, Densenet-121) and training procedures, show that we detect radioactive data with high confidence (p <0.0001) when only 1% of the data used to train a model is radioactive. Our radioactive mark is resilient to strong data augmentations and variations of the model architecture. As a result, it offers a much higher signal-to-noise ratio than data poisoning and backdoor methods.
Complete list of metadatas

Cited literature [37 references]  Display  Hide  Download

https://hal.inria.fr/hal-02954159
Contributor : Alexandre Sablayrolles <>
Submitted on : Wednesday, September 30, 2020 - 6:59:01 PM
Last modification on : Monday, October 12, 2020 - 11:16:02 AM

File

Radioactive_data.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02954159, version 1

Collections

Citation

Alexandre Sablayrolles, Matthijs Douze, Cordelia Schmid, Hervé Jégou. Radioactive Data: Tracing Through Training. ICML 2020 - Thirty-seventh International Conference on Machine Learning, Jul 2020, Vienna / Virtual, Austria. ⟨hal-02954159⟩

Share

Metrics

Record views

28

Files downloads

190