Skip to Main content Skip to Navigation
New interface
Conference papers

Radioactive Data: Tracing Through Training

Alexandre Sablayrolles 1 Matthijs Douze 1 Cordelia Schmid 2, 3 Hervé Jégou 1 
2 Thoth - Apprentissage de modèles à partir de données massives
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann
3 WILLOW - Models of visual object recognition and scene understanding
DI-ENS - Département d'informatique - ENS Paris, Inria de Paris
Abstract : Data tracing determines whether particular data samples have been used to train a model. We propose a new technique, radioactive data, that makes imperceptible changes to these samples such that any model trained on them will bear an identifiable mark. Given a trained model, our technique detects the use of radioactive data and provides a level of confidence (p-value). Experiments on large-scale benchmarks (Imagenet), with standard architectures (Resnet-18, VGG-16, Densenet-121) and training procedures, show that we detect radioactive data with high confidence (p <0.0001) when only 1% of the data used to train a model is radioactive. Our radioactive mark is resilient to strong data augmentations and variations of the model architecture. As a result, it offers a much higher signal-to-noise ratio than data poisoning and backdoor methods.
Complete list of metadata

Cited literature [37 references]  Display  Hide  Download
Contributor : Alexandre Sablayrolles Connect in order to contact the contributor
Submitted on : Wednesday, September 30, 2020 - 6:59:01 PM
Last modification on : Friday, November 18, 2022 - 9:23:07 AM
Long-term archiving on: : Monday, January 4, 2021 - 8:44:01 AM


Files produced by the author(s)


  • HAL Id : hal-02954159, version 1



Alexandre Sablayrolles, Matthijs Douze, Cordelia Schmid, Hervé Jégou. Radioactive Data: Tracing Through Training. ICML 2020 - Thirty-seventh International Conference on Machine Learning, Jul 2020, Vienna / Virtual, Austria. pp.8326-8335. ⟨hal-02954159⟩



Record views


Files downloads