Probabilistic k$^m$-anonymity: Efficient Anonymization of Large Set-Valued Datasets

Acs Gergely 1 Jagdish Prasad Achara 1 Claude Castelluccia 1
1 PRIVATICS - Privacy Models, Architectures and Tools for the Information Society
Inria Grenoble - Rhône-Alpes, CITI - CITI Centre of Innovation in Telecommunications and Integration of services
Abstract : Set-valued dataset contains different types of items/values per individual, for example, visited locations, purchased goods, watched movies, or search queries. As it is relatively easy to re-identify individuals in such datasets, their release poses significant privacy threats. Hence, organizations aiming to share such datasets must adhere to personal data regulations. In order to get rid of these regulations and also to benefit from sharing, these datasets should be anonymized before their release. In this paper, we revisit the problem of anonymizing set-valued data. We argue that anonymization techniques targeting traditional \emph{k\textsuperscript{m}}-anonymity model, which limits the adversarial background knowledge to at most \emph{m} items per individual, are impractical for large real-world datasets. Hence, we propose a probabilistic relaxation of \emph{k\textsuperscript{m}}-anonymity and present an anonymization technique to achieve it. This relaxation also improves the utility of the anonymized data. We also demonstrate the effectiveness of our scalable anonymization technique on a real-world location dataset consisting of more than 4 million subscribers of a large European telecom operator. We believe that our technique can be very appealing for practitioners willing to share such large datasets.
Type de document :
Communication dans un congrès
IEEE Internation Conference on Big Data (BigData) 2015, Oct 2015, Santa Clara, United States
Liste complète des métadonnées
Contributeur : Acs Gergely <>
Soumis le : vendredi 25 septembre 2015 - 16:10:55
Dernière modification le : jeudi 19 novembre 2015 - 01:20:51
Document(s) archivé(s) le : mardi 29 décembre 2015 - 09:22:16


Fichiers produits par l'(les) auteur(s)


  • HAL Id : hal-01205533, version 1



Acs Gergely, Jagdish Prasad Achara, Claude Castelluccia. Probabilistic k$^m$-anonymity: Efficient Anonymization of Large Set-Valued Datasets. IEEE Internation Conference on Big Data (BigData) 2015, Oct 2015, Santa Clara, United States. <hal-01205533>



Consultations de
la notice


Téléchargements du document