Rough Sets in Imbalanced Data Problem: Improving Re–sampling Process

Katarzyna Borowska; Jarosław Stepaniuk

doi:10.1007/978-3-319-59105-6_39

Communication Dans Un Congrès Année : 2017

Rough Sets in Imbalanced Data Problem: Improving Re–sampling Process

(1) , (1)

Katarzyna Borowska

Fonction : Auteur
PersonId : 1023033

Białystok University of Technology

Jarosław Stepaniuk

Fonction : Auteur
PersonId : 1023034

Białystok University of Technology

Résumé

Imbalanced data problem is still one of the most interesting and important research subjects. The latest experiments and detailed analysis revealed that not only the underrepresented classes are the main cause of performance loss in machine learning process, but also the inherent complex characteristics of data. The list of discovered significant difficulty factors consists of the phenomena like class overlapping, decomposition of the minority class, presence of noise and outliers. Although there are numerous solutions proposed, it is still unclear how to deal with all of these issues together and correctly evaluate the class distribution to select a proper treatment (especially considering the real–world applications where levels of uncertainty are eminently high). Since applying rough sets theory to the imbalanced data learning problem could be a promising research direction, the improved re–sampling approach combining selective preprocessing and editing techniques is introduced in this paper. The novel technique allows both qualitative and quantitative data handling.

Mots clés

Data preprocessing Class imbalance Rough sets SMOTE Oversampling Undersampling

Domaines

Informatique [cs] Sciences de l'information et de la communication

Fichier principal

448933_1_En_39_Chapter.pdf (85.22 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Hal Ifip : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01656246

Soumis le : mardi 5 décembre 2017-14:58:50

Dernière modification le : mercredi 6 décembre 2017-01:21:00

Dates et versions

hal-01656246 , version 1 (05-12-2017)

Licence

Paternité

Identifiants

HAL Id : hal-01656246 , version 1
DOI : 10.1007/978-3-319-59105-6_39

Citer

Katarzyna Borowska, Jarosław Stepaniuk. Rough Sets in Imbalanced Data Problem: Improving Re–sampling Process. 16th IFIP International Conference on Computer Information Systems and Industrial Management (CISIM), Jun 2017, Bialystok, Poland. pp.459-469, ⟨10.1007/978-3-319-59105-6_39⟩. ⟨hal-01656246⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

IFIP-LNCS IFIP IFIP-TC IFIP-TC8 IFIP-CISIM IFIP-LNCS-10244

54 Consultations

152 Téléchargements

Rough Sets in Imbalanced Data Problem: Improving Re–sampling Process

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager