Preventing author profiling through zero-shot multilingual back-translation

David Ifeoluwa Adelani; Miaoran Zhang; Xiaoyu Shen; Ali Davody; Thomas Kleinbauer; Dietrich Klakow

Communication Dans Un Congrès Année : 2021

Preventing author profiling through zero-shot multilingual back-translation

(1) , (1) , (1) , (1) , (1) , (1)

David Ifeoluwa Adelani

Fonction : Auteur
PersonId : 1073845

Saarland University [Saarbrücken]

Miaoran Zhang

Fonction : Auteur
PersonId : 1110825

Saarland University [Saarbrücken]

Xiaoyu Shen

Fonction : Auteur
PersonId : 1110826

Saarland University [Saarbrücken]

Ali Davody

Fonction : Auteur
PersonId : 1073846

Saarland University [Saarbrücken]

Thomas Kleinbauer

Fonction : Auteur
PersonId : 1073842

Saarland University [Saarbrücken]

Dietrich Klakow

Fonction : Auteur
PersonId : 1095147

Saarland University [Saarbrücken]

Résumé

Documents as short as a single sentence may inadvertently reveal sensitive information about their authors, including e.g. their gender or ethnicity. Style transfer is an effective way of transforming texts in order to remove any information that enables author profiling. However, for a number of current state-of-theart approaches the improved privacy is accompanied by an undesirable drop in the downstream utility of the transformed data. In this paper, we propose a simple, zero-shot way to effectively lower the risk of author profiling through multilingual back-translation using off-the-shelf translation models. We compare our models with five representative text style transfer models on three datasets across different domains. Results from both an automatic and a human evaluation show that our approach achieves the best overall performance while requiring no training data. We are able to lower the adversarial prediction of gender and race by up to 22% while retaining 95% of the original utility on downstream tasks.

Domaines

Informatique et langage [cs.CL]

Fichier principal

adelani_EMNLP2021.pdf (233.62 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Emmanuel Vincent : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-03350906

Soumis le : mardi 21 septembre 2021-16:47:28

Dernière modification le : mercredi 22 septembre 2021-03:08:40

Archivage à long terme le : mercredi 22 décembre 2021-19:16:09

Dates et versions

hal-03350906 , version 1 (21-09-2021)

Identifiants

HAL Id : hal-03350906 , version 1

Citer

David Ifeoluwa Adelani, Miaoran Zhang, Xiaoyu Shen, Ali Davody, Thomas Kleinbauer, et al.. Preventing author profiling through zero-shot multilingual back-translation. 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), Nov 2021, Punta Cana, Dominica. ⟨hal-03350906⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

49 Consultations

71 Téléchargements

Preventing author profiling through zero-shot multilingual back-translation

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Partager