DO NOT DISTURB? Classifier Behavior on Perturbed Datasets

Bernd Malle; Peter Kieseberg; Andreas Holzinger

doi:10.1007/978-3-319-66808-6_11

Communication Dans Un Congrès Année : 2017

DO NOT DISTURB? Classifier Behavior on Perturbed Datasets

(1, 2) , (1, 2) , (1)

1
2

Bernd Malle

Fonction : Auteur
PersonId : 1022785

Medical University Graz

SBA Research

Peter Kieseberg

Fonction : Auteur
PersonId : 993469

Medical University Graz

SBA Research

Andreas Holzinger

Fonction : Auteur
PersonId : 1022786

Medical University Graz

Résumé

Exponential trends in data generation are presenting today’s organizations, economies and governments with challenges never encountered before, especially in the field of privacy and data security. One crucial trade-off regulators are facing regards the simultaneous need for publishing personal information for the sake of statistical analysis and Machine Learning in order to increase quality levels in areas like medical services, while at the same time protecting the identity of individuals. A key European measure will be the introduction of the General Data Protection Regulation (GDPR) in 2018, giving customers the ‘right to be forgotten’, i.e. having their data deleted on request. As this could lead to a competitive disadvantage for European companies, it is important to understand which effects deletion of significant data points has on the performance of ML techniques. In a previous paper we introduced a series of experiments applying different algorithms to a binary classification problem under anonymization as well as perturbation. In this paper we extend those experiments by multi-class classification and introduce outlier-removal as an additional scenario. While the results of our previous work were mostly in-line with our expectations, our current experiments revealed unexpected behavior over a range of different scenarios. A surprising conclusion of those experiments is the fact that classification on an anonymized dataset with outliers removed in beforehand can almost compete with classification on the original, un-anonymized dataset. This could soon lead to competitive Machine Learning pipelines on anonymized datasets for real-world usage in the marketplace.

Mots clés

Machine learning Knowledge bases Right to be forgotten Perturbation K-anonymity SaNGreeA Information loss Cost weighing vector Multi-class classification Outlier analysis Variance-sensitive analysis

Domaines

Informatique [cs] Sciences de l'information et de la communication

Fichier principal

456304_1_En_11_Chapter.pdf (1.5 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Hal Ifip : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01677128

Soumis le : lundi 8 janvier 2018-09:49:24

Dernière modification le : mercredi 28 mars 2018-16:35:04

Archivage à long terme le : mercredi 23 mai 2018-14:44:13

Dates et versions

hal-01677128 , version 1 (08-01-2018)

Licence

Paternité

Identifiants

HAL Id : hal-01677128 , version 1
DOI : 10.1007/978-3-319-66808-6_11

Citer

Bernd Malle, Peter Kieseberg, Andreas Holzinger. DO NOT DISTURB? Classifier Behavior on Perturbed Datasets. 1st International Cross-Domain Conference for Machine Learning and Knowledge Extraction (CD-MAKE), Aug 2017, Reggio, Italy. pp.155-173, ⟨10.1007/978-3-319-66808-6_11⟩. ⟨hal-01677128⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

IFIP-LNCS IFIP IFIP-TC IFIP-TC5 IFIP-WG IFIP-TC12 IFIP-TC8 IFIP-WG8-4 IFIP-WG8-9 IFIP-LNCS-10410 IFIP-CD-MAKE IFIP-WG12-9

314 Consultations

164 Téléchargements

DO NOT DISTURB? Classifier Behavior on Perturbed Datasets

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager