Imbalanced Data Classification: A Novel Re-sampling Approach Combining Versatile Improved SMOTE and Rough Sets - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2016

Imbalanced Data Classification: A Novel Re-sampling Approach Combining Versatile Improved SMOTE and Rough Sets

Katarzyna Borowska
  • Fonction : Auteur
  • PersonId : 1023033
Jarosław Stepaniuk
  • Fonction : Auteur
  • PersonId : 1023034

Résumé

In recent years, the problem of learning from imbalanced data has emerged as important and challenging. The fact that one of the classes is underrepresented in the data set is not the only reason of difficulties. The complex distribution of data, especially small disjuncts, noise and class overlapping, contributes to the significant depletion of classifier’s performance. Hence, the numerous solutions were proposed. They are categorized into three groups: data-level techniques, algorithm-level methods and cost-sensitive approaches. This paper presents a novel data-level method combining Versatile Improved SMOTE and rough sets. The algorithm was applied to the two-class problems, data sets were characterized by the nominal attributes. We evaluated the proposed technique in comparison with other preprocessing methods. The impact of the additional cleaning phase was specifically verified.
Fichier principal
Vignette du fichier
419526_1_En_4_Chapter.pdf (213.45 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01637478 , version 1 (17-11-2017)

Licence

Paternité

Identifiants

Citer

Katarzyna Borowska, Jarosław Stepaniuk. Imbalanced Data Classification: A Novel Re-sampling Approach Combining Versatile Improved SMOTE and Rough Sets. 15th IFIP International Conference on Computer Information Systems and Industrial Management (CISIM), Sep 2016, Vilnius, Lithuania. pp.31-42, ⟨10.1007/978-3-319-45378-1_4⟩. ⟨hal-01637478⟩
151 Consultations
229 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More