The Pitfalls of Hashing for Privacy

Abstract : Boosted by recent legislations, data anonymization is fast becoming a norm. However, as of yet no generic solution has been found to safely release data. As a consequence, data custodians often resort to ad-hoc means to anonymize datasets. Both past and current practices indicate that hashing is often believed to be an effective way to anonymize data. Unfortunately, in practice it is only rarely effective. This paper is a tutorial to explain the limits of cryptographic hash functions as an anonymization technique. Anonymity set is the best privacy model that can be achieved by hash functions. However, this model has several shortcomings. We provide three case studies to illustrate how hashing only yields a weakly anonymized data. The case studies include MAC and email address anonymization as well as the analysis of Google Safe Browsing.Boosted by recent legislations, data anonymization is fast becoming a norm. However, as of yet no generic solution has been found to safely release data. As a consequence, data custodians often resort to ad-hoc means to anonymize datasets. Both past and current practices indicate that hashing is often believed to be an effective way to anonymize data. Unfortunately, in practice it is only rarely effective. This paper is a tutorial to explain the limits of cryptographic hash functions as an anonymization technique. Anonymity set is the best privacy model that can be achieved by hash functions. However, this model has several shortcomings. We provide three case studies to illustrate how hashing only yields a weakly anonymized data. The case studies include MAC and email address anonymization as well as the analysis of Google Safe Browsing.
Type de document :
Article dans une revue
Communications Surveys and Tutorials, IEEE Communications Society, Institute of Electrical and Electronics Engineers, 2018
Liste complète des métadonnées

https://hal.inria.fr/hal-01589210
Contributeur : Cédric Lauradoux <>
Soumis le : lundi 18 septembre 2017 - 12:30:17
Dernière modification le : mardi 19 septembre 2017 - 01:08:57

Identifiants

  • HAL Id : hal-01589210, version 1

Collections

Citation

Levent Demir, Amrit Kumar, Mathieu Cunche, Cédric Lauradoux. The Pitfalls of Hashing for Privacy. Communications Surveys and Tutorials, IEEE Communications Society, Institute of Electrical and Electronics Engineers, 2018. 〈hal-01589210〉

Partager

Métriques

Consultations de la notice

88