Differentially private speaker anonymization

Ali Shahin Shamsabadi; Brij Mohan Lal Srivastava; Aurélien Bellet; Nathalie Vauquier; Emmanuel Vincent; Mohamed Maouche; Marc Tommasi; Nicolas Papernot

doi:10.48550/arXiv.2202.11823

Article Dans Une Revue Proceedings on Privacy Enhancing Technologies Année : 2023

Differentially private speaker anonymization

(1, 2, 3) , (3) , (3) , (3) , (4) , (3) , (3) , (2, 5)

1
2
3
4
5

Ali Shahin Shamsabadi

Fonction : Auteur

The Alan Turing Institute

Vector Institute

Machine Learning in Information Networks

Brij Mohan Lal Srivastava

Fonction : Auteur

Machine Learning in Information Networks

Aurélien Bellet

Fonction : Auteur
PersonId : 782872
IdRef : 17653136X

Machine Learning in Information Networks

Nathalie Vauquier

Fonction : Auteur

Machine Learning in Information Networks

Emmanuel Vincent

Fonction : Auteur
PersonId : 1256
IdHAL : emmanuelv
ORCID : 0000-0002-0183-7289
IdRef : 089360176

Speech Modeling for Facilitating Oral-Based Communication

Mohamed Maouche

Fonction : Auteur

Machine Learning in Information Networks

Marc Tommasi

Fonction : Auteur
PersonId : 399
IdHAL : marc-tommasi
ORCID : 0000-0003-2838-4408
IdRef : 121846385

Machine Learning in Information Networks

Nicolas Papernot

Fonction : Auteur

Vector Institute

Department of Computer Science [University of Toronto]

Résumé

Sharing real-world speech utterances is key to the training and deployment of voice-based services. However, it also raises privacy risks as speech contains a wealth of personal data. Speaker anonymization aims to remove speaker information from a speech utterance while leaving its linguistic and prosodic attributes intact. State-of-the-art techniques operate by disentangling the speaker information (represented via a speaker embedding) from these attributes and re-synthesizing speech based on the speaker embedding of another speaker. Prior research in the privacy community has shown that anonymization often provides brittle privacy protection, even less so any provable guarantee. In this work, we show that disentanglement is indeed not perfect: linguistic and prosodic attributes still contain speaker information. We remove speaker information from these attributes by introducing differentially private feature extractors based on an autoencoder and an automatic speech recognizer, respectively, trained using noise layers. We plug these extractors in the state-of-the-art anonymization pipeline and generate, for the first time, differentially private utterances with a provable upper bound on the speaker information they contain. We evaluate empirically the privacy and utility resulting from our differentially private speaker anonymization approach on the LibriSpeech data set. Experimental results show that the generated utterances retain very high utility for automatic speech recognition training and inference, while being much better protected against strong adversaries who leverage the full knowledge of the anonymization process to try to infer the speaker identity.

Domaines

Traitement du signal et de l'image [eess.SP]

Emmanuel Vincent : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-03588932

Soumis le : vendredi 25 février 2022-11:49:08

Dernière modification le : jeudi 1 février 2024-10:05:10

Dates et versions

hal-03588932 , version 1 (25-02-2022)

Identifiants

HAL Id : hal-03588932 , version 1
ARXIV : 2202.11823
DOI : 10.48550/arXiv.2202.11823

Citer

Ali Shahin Shamsabadi, Brij Mohan Lal Srivastava, Aurélien Bellet, Nathalie Vauquier, Emmanuel Vincent, et al.. Differentially private speaker anonymization. Proceedings on Privacy Enhancing Technologies, 2023, 2023 (1), ⟨10.48550/arXiv.2202.11823⟩. ⟨hal-03588932⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 CNRS INRIA IRISA CRISTAL UNIV-LORRAINE INRIA2 CRISTAL-MAGNET LORIA LORIA-NLPKD UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UNIV-LILLE HYAIAI ANR UR1-MATH-NUM

244 Consultations

0 Téléchargements

Differentially private speaker anonymization

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager