Privacy leakages on NLP models and mitigations through a use case on medical data

Gaspard Berthelier; Antoine Boutet; Antoine Richard

Communication Dans Un Congrès Année : 2023

Privacy leakages on NLP models and mitigations through a use case on medical data

(1) , (1) , (2, 3)

1
2
3

Gaspard Berthelier

Fonction : Auteur

Privacy Models, Architectures and Tools for the Information Society

Antoine Boutet

Fonction : Auteur

Privacy Models, Architectures and Tools for the Information Society

Antoine Richard

Fonction : Auteur

Hospices Civils de Lyon

SIMulating and Building IOT

Résumé

Patient medical data is extremely sensitive and private, and thus subject to numerous regulations which require anonymization before disseminating the data. The anonymization of medical documents is a complex task but the recent advances in NLP models have shown encouraging results. Nevertheless, privacy risks associated with NLP models may still remain. In this paper, we present the main privacy concerns in NLP and a case study conducted in collaboration with the Hospices Civils de Lyon (HCL) to exploit NLP models to anonymize medical data.

Mots clés

machine learning natural language processing membership inference data extraction differential privacy federated learning medical data anonymization

Domaines

Intelligence artificielle [cs.AI]

Fichier principal

compas2023.pdf (811.64 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Antoine Boutet : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-04138528

Soumis le : vendredi 23 juin 2023-09:18:50

Dernière modification le : lundi 8 janvier 2024-16:07:32

Archivage à long terme le : dimanche 24 septembre 2023-18:12:15

Dates et versions

hal-04138528 , version 1 (23-06-2023)

Licence

Paternité

Identifiants

HAL Id : hal-04138528 , version 1

Citer

Gaspard Berthelier, Antoine Boutet, Antoine Richard. Privacy leakages on NLP models and mitigations through a use case on medical data. COMPAS 2023 - Conférence francophone d'informatique en Parallélisme, Architecture et Système, LISTIC - Laboratoire d’Informatique, Systèmes, Traitement de l’Information et de la Connaissance / USBM - Université Savoie Mont Blanc, Jul 2023, Annecy, France. pp.1-8. ⟨hal-04138528⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

HCL CNRS INRIA INSA-LYON UNIV-LORRAINE INRIA2 LORIA LORIA-NSS CITI INSA-GROUPE UDL

103 Consultations

71 Téléchargements

Privacy leakages on NLP models and mitigations through a use case on medical data

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Partager