Probabilistic k$^m$-anonymity: Efficient Anonymization of Large Set-Valued Datasets

Acs Gergely 1 Jagdish Prasad Achara 1 Claude Castelluccia 1
1 PRIVATICS - Privacy Models, Architectures and Tools for the Information Society
Inria Grenoble - Rhône-Alpes, CITI - CITI Centre of Innovation in Telecommunications and Integration of services
Abstract : Set-valued dataset contains different types of items/values per individual, for example, visited locations, purchased goods, watched movies, or search queries. As it is relatively easy to re-identify individuals in such datasets, their release poses significant privacy threats. Hence, organizations aiming to share such datasets must adhere to personal data regulations. In order to get rid of these regulations and also to benefit from sharing, these datasets should be anonymized before their release. In this paper, we revisit the problem of anonymizing set-valued data. We argue that anonymization techniques targeting traditional \emph{k\textsuperscript{m}}-anonymity model, which limits the adversarial background knowledge to at most \emph{m} items per individual, are impractical for large real-world datasets. Hence, we propose a probabilistic relaxation of \emph{k\textsuperscript{m}}-anonymity and present an anonymization technique to achieve it. This relaxation also improves the utility of the anonymized data. We also demonstrate the effectiveness of our scalable anonymization technique on a real-world location dataset consisting of more than 4 million subscribers of a large European telecom operator. We believe that our technique can be very appealing for practitioners willing to share such large datasets.
Document type :
Conference papers
Complete list of metadatas

Cited literature [26 references]  Display  Hide  Download

https://hal.inria.fr/hal-01205533
Contributor : Acs Gergely <>
Submitted on : Friday, September 25, 2015 - 4:10:55 PM
Last modification on : Friday, June 21, 2019 - 9:52:21 AM
Long-term archiving on : Tuesday, December 29, 2015 - 9:22:16 AM

File

paper.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01205533, version 1

Collections

Citation

Acs Gergely, Jagdish Prasad Achara, Claude Castelluccia. Probabilistic k$^m$-anonymity: Efficient Anonymization of Large Set-Valued Datasets. IEEE Internation Conference on Big Data (BigData) 2015, Oct 2015, Santa Clara, United States. ⟨hal-01205533⟩

Share

Metrics

Record views

416

Files downloads

458