An open generator of synthetic administrative healthcare databases - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2021

An open generator of synthetic administrative healthcare databases

Résumé

The recent development of data analysis provides opportunities for improving healthcare systems through analysis of health databases. However the thirst for data is conflicting with preserving the privacy of individuals. The generation of synthetic datasets may foster research on healthcare data analytics. It is mostly based on generative statistical models fitted on real data. Thus it still requires access to sensitive data. This article proposes a probabilistic relational model fitted on publicly available datasets. Public healthcare statistics provide valuable information to mimic statistical distributions and do not hold sensitive personal data. More specifically, we propose to generate a synthetic version of the national database of French insured patients. We do not only provide synthetic datasets, but a generator of datasets that can be used without any data access request. Experiments compare official statistics with those computed on synthetic datasets.
Fichier principal
Vignette du fichier
datasynth.pdf (275.16 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03326618 , version 1 (26-08-2021)

Identifiants

  • HAL Id : hal-03326618 , version 1

Citer

Thomas Guyet, Tristan Allard, Johanne Bakalara, Olivier Dameron. An open generator of synthetic administrative healthcare databases. IAS 2021 - Atelier Intelligence Artificielle et Santé, Jun 2021, Bordeaux (virtuel), France. pp.1-8. ⟨hal-03326618⟩
205 Consultations
225 Téléchargements

Partager

Gmail Facebook X LinkedIn More