# Empirical Assessment of Performance Measures for Preprocessing Moments in Imbalanced Data Classification Problem

Abstract : The article concerns the problem of imbalanced data classification, when classes, into which elements belong, are not equally represented. In the classification model building process cross-validation technique is one of the most popular to assess the efficacy of a classifier. While over-sampling methods are used to create new objects to obtain the balance between the number of objects in classes, inappropriate usage of the preprocessing moment has a direct impact on the achieved results. In most cases they are overestimated. To present and assess this phenomenon in this paper three preprocessing techniques (SMOTE, Safe-level SMOTE, SPIDER) and their modifications are used to make new elements of data sets to balance cardinalities of classes, and two classification methods (SVM, C4.5) are compared. k-folds cross-validation technique ($k=10$) considering two moments of preprocessing approaches is performed. The measures as precision, recall, F-measure and area under the ROC curve (AUC) are calculated and compared.
Document type :
Conference papers
Domain :

Cited literature [11 references]

https://hal.inria.fr/hal-01637457
Contributor : Hal Ifip Connect in order to contact the contributor
Submitted on : Friday, November 17, 2017 - 3:43:15 PM
Last modification on : Saturday, November 18, 2017 - 1:16:35 AM
Long-term archiving on: : Sunday, February 18, 2018 - 2:29:33 PM

### File

419526_1_En_17_Chapter.pdf
Files produced by the author(s)

### Citation

Paweł Szeszko, Magdalena Topczewska. Empirical Assessment of Performance Measures for Preprocessing Moments in Imbalanced Data Classification Problem. 15th IFIP International Conference on Computer Information Systems and Industrial Management (CISIM), Sep 2016, Vilnius, Lithuania. pp.183-194, ⟨10.1007/978-3-319-45378-1_17⟩. ⟨hal-01637457⟩

Record views