Skip to Main content Skip to Navigation
Conference papers

Probabilistic k$^m$-anonymity: Efficient Anonymization of Large Set-Valued Datasets

Acs Gergely 1 Jagdish Prasad Achara 1 Claude Castelluccia 1
1 PRIVATICS - Privacy Models, Architectures and Tools for the Information Society
Inria Grenoble - Rhône-Alpes, CITI - CITI Centre of Innovation in Telecommunications and Integration of services
Abstract : Set-valued dataset contains different types of items/values per individual, for example, visited locations, purchased goods, watched movies, or search queries. As it is relatively easy to re-identify individuals in such datasets, their release poses significant privacy threats. Hence, organizations aiming to share such datasets must adhere to personal data regulations. In order to get rid of these regulations and also to benefit from sharing, these datasets should be anonymized before their release. In this paper, we revisit the problem of anonymizing set-valued data. We argue that anonymization techniques targeting traditional \emph{k\textsuperscript{m}}-anonymity model, which limits the adversarial background knowledge to at most \emph{m} items per individual, are impractical for large real-world datasets. Hence, we propose a probabilistic relaxation of \emph{k\textsuperscript{m}}-anonymity and present an anonymization technique to achieve it. This relaxation also improves the utility of the anonymized data. We also demonstrate the effectiveness of our scalable anonymization technique on a real-world location dataset consisting of more than 4 million subscribers of a large European telecom operator. We believe that our technique can be very appealing for practitioners willing to share such large datasets.
Document type :
Conference papers
Complete list of metadata

Cited literature [26 references]  Display  Hide  Download
Contributor : Acs Gergely Connect in order to contact the contributor
Submitted on : Friday, September 25, 2015 - 4:10:55 PM
Last modification on : Friday, December 10, 2021 - 1:16:03 PM
Long-term archiving on: : Tuesday, December 29, 2015 - 9:22:16 AM


Files produced by the author(s)


  • HAL Id : hal-01205533, version 1



Acs Gergely, Jagdish Prasad Achara, Claude Castelluccia. Probabilistic k$^m$-anonymity: Efficient Anonymization of Large Set-Valued Datasets. IEEE Internation Conference on Big Data (BigData) 2015, Oct 2015, Santa Clara, United States. ⟨hal-01205533⟩



Les métriques sont temporairement indisponibles