# Empirical Assessment of Performance Measures for Preprocessing Moments in Imbalanced Data Classification Problem

Abstract : The article concerns the problem of imbalanced data classification, when classes, into which elements belong, are not equally represented. In the classification model building process cross-validation technique is one of the most popular to assess the efficacy of a classifier. While over-sampling methods are used to create new objects to obtain the balance between the number of objects in classes, inappropriate usage of the preprocessing moment has a direct impact on the achieved results. In most cases they are overestimated. To present and assess this phenomenon in this paper three preprocessing techniques (SMOTE, Safe-level SMOTE, SPIDER) and their modifications are used to make new elements of data sets to balance cardinalities of classes, and two classification methods (SVM, C4.5) are compared. k-folds cross-validation technique ($k=10$) considering two moments of preprocessing approaches is performed. The measures as precision, recall, F-measure and area under the ROC curve (AUC) are calculated and compared.
Khalid Saeed; Władysław Homenda. 15th IFIP International Conference on Computer Information Systems and Industrial Management (CISIM), Sep 2016, Vilnius, Lithuania. Springer International Publishing, Lecture Notes in Computer Science, LNCS-9842, pp.183-194, 2016, Computer Information Systems and Industrial Management. 〈10.1007/978-3-319-45378-1_17〉
https://hal.inria.fr/hal-01637457
