Empirical Assessment of Performance Measures for Preprocessing Moments in Imbalanced Data Classification Problem

Abstract : The article concerns the problem of imbalanced data classification, when classes, into which elements belong, are not equally represented. In the classification model building process cross-validation technique is one of the most popular to assess the efficacy of a classifier. While over-sampling methods are used to create new objects to obtain the balance between the number of objects in classes, inappropriate usage of the preprocessing moment has a direct impact on the achieved results. In most cases they are overestimated. To present and assess this phenomenon in this paper three preprocessing techniques (SMOTE, Safe-level SMOTE, SPIDER) and their modifications are used to make new elements of data sets to balance cardinalities of classes, and two classification methods (SVM, C4.5) are compared. k-folds cross-validation technique ($$k=10$$) considering two moments of preprocessing approaches is performed. The measures as precision, recall, F-measure and area under the ROC curve (AUC) are calculated and compared.
Type de document :
Communication dans un congrès
Khalid Saeed; Władysław Homenda. 15th IFIP International Conference on Computer Information Systems and Industrial Management (CISIM), Sep 2016, Vilnius, Lithuania. Springer International Publishing, Lecture Notes in Computer Science, LNCS-9842, pp.183-194, 2016, Computer Information Systems and Industrial Management. 〈10.1007/978-3-319-45378-1_17〉
Liste complète des métadonnées

Littérature citée [11 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01637457
Contributeur : Hal Ifip <>
Soumis le : vendredi 17 novembre 2017 - 15:43:15
Dernière modification le : samedi 18 novembre 2017 - 01:16:35
Document(s) archivé(s) le : dimanche 18 février 2018 - 14:29:33

Fichier

 Accès restreint
Fichier visible le : 2019-01-01

Connectez-vous pour demander l'accès au fichier

Licence


Distributed under a Creative Commons Paternité 4.0 International License

Identifiants

Collections

Citation

Paweł Szeszko, Magdalena Topczewska. Empirical Assessment of Performance Measures for Preprocessing Moments in Imbalanced Data Classification Problem. Khalid Saeed; Władysław Homenda. 15th IFIP International Conference on Computer Information Systems and Industrial Management (CISIM), Sep 2016, Vilnius, Lithuania. Springer International Publishing, Lecture Notes in Computer Science, LNCS-9842, pp.183-194, 2016, Computer Information Systems and Industrial Management. 〈10.1007/978-3-319-45378-1_17〉. 〈hal-01637457〉

Partager

Métriques

Consultations de la notice

86