Privacy-Preserving Anomaly Detection Using Synthetic Data

Rudolf Mayer; Markus Hittmeir; Andreas Ekelhart

doi:10.1007/978-3-030-49669-2_11

Communication Dans Un Congrès Année : 2020

Privacy-Preserving Anomaly Detection Using Synthetic Data

(1) , (1) , (1)

Rudolf Mayer

Fonction : Auteur
PersonId : 1066993

SBA Research

Markus Hittmeir

Fonction : Auteur
PersonId : 1100403

SBA Research

Andreas Ekelhart

Fonction : Auteur
PersonId : 1066990

SBA Research

Résumé

With ever increasing capacity for collecting, storing, and processing of data, there is also a high demand for intelligent knowledge discovery and data analysis methods. While there have been impressive advances in machine learning and similar domains in recent years, this also gives rise to concerns regarding the protection of personal and otherwise sensitive data, especially if it is to be analysed by third parties, e.g. in collaborative settings, where it shall be exchanged for the benefit of training more powerful models. One scenario is anomaly detection, which aims at identifying rare items, events or observations, differing from the majority of the data. Such anomalous items, also referred to as outliers, often correspond to problematic cases, e.g. bank fraud, rare medical diseases, or intrusions, e.g. attacks on IT systems.Besides anonymisation, which becomes difficult to achieve especially with high dimensional data, one approach for privacy-preserving data mining lies in the usage of synthetic data. Synthetic data comes with the promise of protecting the users’ data and producing analysis results close to those achieved by using real data. However, since most synthetisation methods aim at preserving rather global properties and not characteristics of individual records to protect sensitive data, this form of data might be inadequate due to a lack of realistic outliers.In this paper, we therefore analyse a number of different approaches for creating synthetic data. We study the utility of the created datasets for anomaly detection in supervised, semi-supervised and unsupervised settings, and compare it to the baseline of the original data.

Mots clés

Synthetic data Anomaly detection Machine learning

Domaines

Informatique [cs]

Fichier principal

496047_1_En_11_Chapter.pdf (585.97 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Hal Ifip : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-03243628

Soumis le : lundi 31 mai 2021-17:32:47

Dernière modification le : lundi 31 mai 2021-18:09:13

Dates et versions

hal-03243628 , version 1 (31-05-2021)

Licence

Paternité

Identifiants

HAL Id : hal-03243628 , version 1
DOI : 10.1007/978-3-030-49669-2_11

Citer

Rudolf Mayer, Markus Hittmeir, Andreas Ekelhart. Privacy-Preserving Anomaly Detection Using Synthetic Data. 34th IFIP Annual Conference on Data and Applications Security and Privacy (DBSec), Jun 2020, Regensburg, Germany. pp.195-207, ⟨10.1007/978-3-030-49669-2_11⟩. ⟨hal-03243628⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

IFIP-LNCS IFIP IFIP-TC IFIP-WG IFIP-TC11 IFIP-WG11-3 IFIP-DBSEC IFIP-LNCS-12122

67 Consultations

8 Téléchargements

Privacy-Preserving Anomaly Detection Using Synthetic Data

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager