Privacy Preserving Synthetic Health Data

Andrew Yale; Saloni Dash; Ritik Dutta; Isabelle Guyon; Adrien Pavao; Kristin P Bennett

Communication Dans Un Congrès Année : 2019

Privacy Preserving Synthetic Health Data

(1) , (2) , (3) , (4, 5, 6) , (5, 6, 4) , (1)

1
2
3
4
5
6

Andrew Yale

Fonction : Auteur

Rensselaer Polytechnic Institute

Saloni Dash

Fonction : Auteur

Birla Institute of Technology and Science

Ritik Dutta

Fonction : Auteur

Indian Institute of Technology [Gandhinagar]

Isabelle Guyon

Fonction : Auteur
PersonId : 963159

TAckling the Underspecified

Laboratoire de Recherche en Informatique

Université Paris-Sud - Paris 11 - Faculté des Sciences

Adrien Pavao

Fonction : Auteur
PersonId : 1049181
IdHAL : adrien-pavao
ORCID : 0000-0001-7374-5095

Laboratoire de Recherche en Informatique

Université Paris-Sud - Paris 11 - Faculté des Sciences

TAckling the Underspecified

Kristin P Bennett

Fonction : Auteur

Rensselaer Polytechnic Institute

Résumé

We examine the feasibility of using synthetic medical data generated by GANs in the classroom, to teach data science in health infor-matics. We present an end-to-end methodology to retain instructional utility, while preserving privacy to a level, which meets regulatory requirements: (1) a GAN is trained by a certified medical-data security-aware agent, inside a secure environment; (2) the final GAN model is used outside of the secure environment by external users (instructors or researchers) to generate synthetic data. This second step facilitates data handling for external users, by avoiding de-identification, which may require special user training, be costly, and/or cause loss of data fidelity. We benchmark our proposed GAN versus various baseline methods using a novel set of metrics. At equal levels of privacy and utility, GANs provide small footprint models, meeting the desired specifications of our application domain. Data, code, and a challenge that we organized for educational purposes are available.

Domaines

Intelligence artificielle [cs.AI] Statistiques [math.ST] Machine Learning [stat.ML] Médecine humaine et pathologie

Fichier principal

ESANN_2019.pdf (1.12 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Adrien Pavao : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-02160496

Soumis le : vendredi 21 juin 2019-09:39:29

Dernière modification le : lundi 22 avril 2024-10:06:24

Dates et versions

hal-02160496 , version 1 (21-06-2019)

Identifiants

HAL Id : hal-02160496 , version 1

Citer

Andrew Yale, Saloni Dash, Ritik Dutta, Isabelle Guyon, Adrien Pavao, et al.. Privacy Preserving Synthetic Health Data. ESANN 2019 - European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Apr 2019, Bruges, Belgium. ⟨hal-02160496⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UMR8623 CENTRALESUPELEC INRIA2 LRI-AO UNIV-PARIS-SACLAY LISN GS-ENGINEERING GS-COMPUTER-SCIENCE LISN-AO

1619 Consultations

2048 Téléchargements

Privacy Preserving Synthetic Health Data

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager