Skip to Main content Skip to Navigation
Conference papers

Privacy guarantees for de-identifying text transformations

Abstract : Machine Learning approaches to Natural Language Processing tasks benefit from a comprehensive collection of real-life user data. At the same time, there is a clear need for protecting the privacy of the users whose data is collected and processed. For text collections, such as, e.g., transcripts of voice interactions or patient records, replacing sensitive parts with benign alternatives can provide de-identification. However, how much privacy is actually guaranteed by such text transformations, and are the resulting texts still useful for machine learning? In this paper, we derive formal privacy guarantees for general text transformation-based de-identification methods on the basis of Differential Privacy. We also measure the effect that different ways of masking private information in dialog transcripts have on a subsequent machine learning task. To this end, we formulate different masking strategies and compare their privacy-utility trade-offs. In particular, we compare a simple redact approach with more sophisticated word-byword replacement using deep learning models on multiple natural language understanding tasks like named entity recognition, intent detection, and dialog act classification. We find that only word-byword replacement is robust against performance drops in various tasks.
Complete list of metadatas

Cited literature [20 references]  Display  Hide  Download

https://hal.inria.fr/hal-02907939
Contributor : Emmanuel Vincent <>
Submitted on : Friday, August 7, 2020 - 9:00:12 PM
Last modification on : Monday, August 10, 2020 - 8:50:07 AM

File

adelani_IS20.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02907939, version 1

Collections

Citation

David Adelani, Ali Davody, Thomas Kleinbauer, Dietrich Klakow. Privacy guarantees for de-identifying text transformations. INTERSPEECH 2020, Oct 2020, Shanghai, China. ⟨hal-02907939⟩

Share

Metrics

Record views

21

Files downloads

3