Differentially Private Sequential Data Publication via Variable-Length N-Grams

Abstract : Sequential data is being increasingly used in a variety of applications. Publishing sequential data is of vital importance to the advancement of these applications. However, as shown by the re-identi cation attacks on the AOL and Netflix datasets, releasing sequential data may pose considerable threats to individual privacy. Recent research has indicated the failure of existing sanitization techniques to provide claimed privacy guarantees. It is therefore urgent to respond to this failure by developing new schemes with provable privacy guarantees. Diff erential privacy is one of the only models that can be used to provide such guarantees. Due to the inherent sequentiality and high-dimensionality, it is challenging to apply di erential privacy to sequential data. In this paper, we address this challenge by employing a variable-length n-gram model, which extracts the essential information of a sequential database in terms of a set of variable-length n-grams. Our approach makes use of a carefully designed exploration tree structure and a set of novel techniques based on theMarkov assumption in order to lower the magnitude of added noise. The published ngrams are useful for many purposes. Furthermore, we develop a solution for generating a synthetic database, which enables a wider spectrum of data analysis tasks. Extensive experiments on real-life datasets demonstrate that our approach substantially outperforms the state-of-the-art techniques.
Type de document :
Communication dans un congrès
ACM Computer and Communication Security (CCS), Oct 2012, Raleigh, United States. 2012
Liste complète des métadonnées

Littérature citée [23 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-00747830
Contributeur : Claude Castelluccia <>
Soumis le : vendredi 2 novembre 2012 - 11:28:19
Dernière modification le : jeudi 26 juillet 2018 - 14:08:02
Document(s) archivé(s) le : samedi 17 décembre 2016 - 07:27:01

Fichier

Differentially_private_sequent...
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00747830, version 1

Collections

Citation

Rui Chen, Gergely Acs, Claude Castelluccia. Differentially Private Sequential Data Publication via Variable-Length N-Grams. ACM Computer and Communication Security (CCS), Oct 2012, Raleigh, United States. 2012. 〈hal-00747830〉

Partager

Métriques

Consultations de la notice

365

Téléchargements de fichiers

541