Skip to Main content Skip to Navigation
Conference papers

Empirical Assessment of Performance Measures for Preprocessing Moments in Imbalanced Data Classification Problem

Abstract : The article concerns the problem of imbalanced data classification, when classes, into which elements belong, are not equally represented. In the classification model building process cross-validation technique is one of the most popular to assess the efficacy of a classifier. While over-sampling methods are used to create new objects to obtain the balance between the number of objects in classes, inappropriate usage of the preprocessing moment has a direct impact on the achieved results. In most cases they are overestimated. To present and assess this phenomenon in this paper three preprocessing techniques (SMOTE, Safe-level SMOTE, SPIDER) and their modifications are used to make new elements of data sets to balance cardinalities of classes, and two classification methods (SVM, C4.5) are compared. k-folds cross-validation technique ($$k=10$$) considering two moments of preprocessing approaches is performed. The measures as precision, recall, F-measure and area under the ROC curve (AUC) are calculated and compared.
Complete list of metadata

Cited literature [11 references]  Display  Hide  Download

https://hal.inria.fr/hal-01637457
Contributor : Hal Ifip <>
Submitted on : Friday, November 17, 2017 - 3:43:15 PM
Last modification on : Saturday, November 18, 2017 - 1:16:35 AM
Long-term archiving on: : Sunday, February 18, 2018 - 2:29:33 PM

File

419526_1_En_17_Chapter.pdf
Files produced by the author(s)

Licence


Distributed under a Creative Commons Attribution 4.0 International License

Identifiers

Collections

Citation

Paweł Szeszko, Magdalena Topczewska. Empirical Assessment of Performance Measures for Preprocessing Moments in Imbalanced Data Classification Problem. 15th IFIP International Conference on Computer Information Systems and Industrial Management (CISIM), Sep 2016, Vilnius, Lithuania. pp.183-194, ⟨10.1007/978-3-319-45378-1_17⟩. ⟨hal-01637457⟩

Share

Metrics

Record views

161

Files downloads

158