Research on Customer Credit Scoring Model Based on Bank Credit Card

. With the development of China's economy, especially the maturity of the market economy, credit is important to the society and individuals. At present, credit system is mainly divided into two parts. Enterprise credit system is an important part of social credit system. But at the same time, as the foundation of social credit system, the establishment of the personal credit system is of great significance to reduce the cost of collecting information and improve the efficiency of loan processing. At the bank level, this paper discretizes the credit card data of a bank, selects the features by calculating Weight of Evidence and Information Value, and information divergence, then uses Logistic Regression to predict. Finally, the results of the Logistic Regression are transformed into visualized credit scores to establish a credit scoring model. It is verified that this model has a good prediction effect.


Introduction
The construction of the financial system in the 21st century is inseparable from the support of the credit system.The problems and risks reflected in the credit are followed, which shows that there are still some shortcomings in the credit system of our country: firstly, there is a lack of relevant detailed laws and regulations; secondly, the customer information data sets used by various enterprises are different, the reliability and quality of the data set used by credit agencies need to be improved.As bank credit is the foundation of credit system, individual customer is the main part of bank customer group.The research of personal customer credit rating is of great significance.The establishment of Individual Credit Investigation System helps to predict risks in advance for commercial early warning analysis of banks.The Credit System presents the customer's credit report in the form of score, and the result is concise.

Related work
The concept of credit scoring originated from the concept of overall division put forward by Fisher (1936) in the field of statistics.Durand (1941) realized that the idea of "division" could be used in the field of economics to divide the "good" and "bad" of loans.With the emergence of credit card, Credit Scoring gradually appears and is used in banking and other fields.In short, the development of credit scoring system is mainly divided into two stages.In the initial stage of market economy, the traditional credit scoring system is also called expert scoring system.The core of credit scoring in this way is "5C" element.With the development of economy, the modern scoring method mainly uses the skills of mathematical statistics to quantify indicators.At present, there are many credit scoring systems based on data mining and big data algorithm.Among the modern scoring methods, the first one to be used is discriminant analysis.Durand (1941) first used discriminant analysis in scoring, and fair (1958) established a credit scoring system on this basis.Myers (1963) used discriminant analysis and regression analysis to establish a credit scoring system, and predicted the credit scoring.In addition to the discriminant method, the regression analysis method is widely used.Under this method, there are many branches.For example, Henley (1995) used linear regression for credit scoring, Wigington (1980) used logistic method.At the same time, this method was still the most commonly method in credit scoring, which could overcome the defects in linear regression.In addition, Nath Jackon (1992) also applied the method of mathematical programming, but it has been proved that the effect of the method of mathematical programming is equivalent to that of the method of linear programming [1].data mining was also used in credit scoring model, which was widely used in credit decision-making and fraud prevention.In the field of data mining, Decision Tree algorithm, Neural Network algorithm, and other methods such as Support Vector Machine(SVM) and Bayesian Network could be used [2].According to the characteristics of the data set in this paper, and the characteristics that logistic regression could effectively screen variables, this paper used logistic regression to analyze the selected eigenvalues [3].In the study of credit scoring card, most of the results [4] in recent years were expressed by the method of binary classification.This paper uses the calculation method for reference [5], transforms the logistic results into the form of scoring, breaks the situation of binary classification, and makes the results more intuitive by grading.This paper obtains the credit card data of a bank customer for six months.After analyzing the obtained data, this paper will use the method of Weight of Evidence, Information Value and information divergence to select the logarithmic features, and consider the prior rules properly in the process of feature selection, so as to try to find the best feature extraction method.Through the comparison of multiple groups, the optimal feature system will be selected, and the selected features are input into the established Logistic Regression model.At the same time, the credit scoring model is built to convert the results into the credit score, and the customers are classified according to the scores.The classification result has value to the bank, and it is also convenient for customers to view their own credit rating.

Construction of Feature Selection System
Data and features determine the upper limit of experimental results.Therefore, feature engineering is important for a model algorithm.This paper mainly uses feature selection in feature engineering to process data.

Information Value
Information Value is a predictive ability to measure features, and the calculation of Information Value is based on the Weight of Evidence.Table 1 below lists the specific calculation method of WOE value [6].From the formula in Table 1, it can be seen that the larger the woe value is, the better the prediction effect of this feature will be.However, it can also be seen that for each variable of each sample, the woe value contains plus and minus.If the woe value is used to measure the prediction ability of the whole feature, there may be a situation of positive and negative offsetting, which greatly reduces the overall prediction ability.In order to make up for the deficiency of woe, it has been proposed that the calculation formula of IV based on woe is as follows [6].
Kindly According to the formula, the larger the IV is, the stronger the prediction ability of the feature is.But at the same time, in order to avoid the occurrence of extreme IV value, we need to make a reasonable discretization of the data before calculating IV.

Information Divergence
Information divergence is used to measure the contribution of a feature to the whole.It is often used for feature selection.The basis of information divergence is entropy.Entropy can be subdivided into information entropy and conditional entropy, and the calculation formula is shown in Table 2[7].The calculation of information divergence is based on conditional entropy and information entropy.The specific formula is as follows: By writing Python program, the entropy of the whole dataset and the information divergence of each eigenvalue can be obtained.

Data Preprocessing
This paper selects the bank credit card data of a bank in Taiwan from April to September, 2005.There are 25 fields in the data set, including 23 features, as shown in Table 3 below.It can be seen from the above table that there are many features in this data set, so it is important to select the most meaningful feature from many features.Considering that both IV and information divergence are statistics to evaluate the importance of features, this paper uses IV and information divergence to screen features respectively, and compares the results, in order to select the better feature selection method for this data set.
First, preprocessing the data, dealing with the missing and abnormal values.The following is the basic description of this data set(see Fig. 1).

Fig. 1. Basic data description
In order to achieve the better fitting effect, 30000 pieces of data are divided into training set and test set according to the proportion of 7:3.
At the same time, in order to calculate the IV of the feature, this paper combines the method of Optimal Binning and equal depth segmentation to discretize each feature of the sample, according to the AUC calculated by different segmentation methods as the measurement standard.

Feature Selection System Based on IV
After the data binning, the IV values of each features has been calculated(see Fig. 2).In general, the prediction ability of IV is measured according to table 4 [8].Considering the influence of prior rules on data sets, it is decided to further consider the contribution of  3 ,  4 ,  5 to the model on the basis of  1 feature system, and obtain the feature system ( 1 ). 1 contains features: 1 ,  3 ,  4 ,  5 ,  8 ,  9 ,  10 ,  11 ,  18 ,  19 ,  20 ,  23 .

Feature Selection System Based on Information Entropy
After preprocessing the original data, the data set is input into the written Python program, and the entropy of the whole data set is 0.762353.It can be seen that the data set of this paper is orderly and carries a lot of valuable information.Then calling the prepared function, calculating the conditional entropy of each feature, We can get the information divergence of each feature after sorting, shown in table 5.

Comparison of Feature Selection System
In the first group, the WOE/IV method is compared with information divergence in feature selection.Information divergence emphasizes the feature with the greatest contribution.IV can more intuitively observe the importance of each feature.In order to verify that the information divergence and IV are more suitable for the field studied in this paper, four groups of specific comparisons are made in this paper, Among them,  1 ,  1 ,  1 and  1 are the feature systems obtained through IV, and  2 ,  2 ,  2 and  2 are the feature systems obtained based on entropy(see Fig. 3).for the specific comparison.It can be seen from the figure that, in the case of the same type of sample features selected, the features constructed by IV are generally better than the model based on the features selected by information divergence.In the second group, the AUC of  1 ,  1 ,  1 and  1 are shown(see Fig. 4).It can be seen from the figure that all the models built based on  1 have better effect.Therefore, this paper selects  1 as the feature system of this paper, inputs Logistic Regression model and finally constructs the scoring model.In the third group, compare the four feature systems selected based on information divergence.The AUC calculated based on  2 ,  2 ,  2 and  2 is shown(see Fig. 5).It can be seen from the pictures that the model effect of  2 is the best, that is, when the selected feature types account for 60% of the total features, the model will have a better effect.

Fig. 5. AUC Based on Information Divergence
Through the above comparison, several rules with reference value can be obtained: 1. Selecting multiple indicators, the model based on IV is better than that based on information divergence.2. In general, when the calculated IV is slightly greater than 0.5, it is better to consider the impact of this feature on the overall data.3. When selecting features based on information divergence, the information divergence can be sorted from large to small, and the effect of selecting 60% of the total feature types is better.

Construction of Feature Selection System
Logistic Regression mode can explain the dependent variables, and is often used to solve the prediction problem of data subject to Normal distribution.Moreover, Logistic Regression model overcomes the defects of linear Regression model, and has strong applicability in credit rating, which is suitable for this model.After inputting the  1 feature system into the Logistic Regression model, and through the AUC obtained after inputting the feature into the model in the third section, it can be found that the prediction ability of this model is better.The logistic model [9] can be represented in table 6.
The logarithm form of odds has been obtained in the above table.In this paper, the logarithm form of probability occurrence ratio is expressed as the linear combination of feature variables, and the woe of each feature is multiplied by the regression coefficient of the variable plus the regression intercept, the scale factor is multiplied by the migration amount, and the corresponding score of each feature is obtained according to formula(4) [5], Among them, odds is the ratio of good and bad customers.Based on historical experience, this paper takes the ratio of good and bad customers as 20, and in order to make the calculated score positive, this paper stipulates that the basic score is 200 at this time, and when odds doubles, the score increases by 20, so that the calculation results of factor and offset can be obtained:  In order to better observe the classification results, this paper makes statistics on the proportion of good and bad customers in the five categories of customers, as shown in Table 7. From the results of the credit scoring model in this paper, it can be seen that the two categories of customers with high credit rating, category I and II, are mostly composed of good customers, category III and IV, are mostly composed of bad customers, and category V completely untrusted customers are totally composed of bad customers.The experimental results are consistent with the actual laws, which further shows that the model in this paper has reference value.

Conclusion
In this paper, the bank credit card data, WOE/IV and information divergence are used for horizontal comparison.In vertical comparison, the method of feature selection considering prior rules and not considering prior rules is used.The  1 group features of IV and prior rules are selected.11 features are extracted from 23 features as the scoring basis, which reduces the complexity of data processing.In the process of feature selection, after many comparative tests, this paper obtains three prior rules.Then, using logistic regression model, input the calculated woe into the model, and the AUC is 0.76.The regression coefficient of each feature is obtained, so as to build a credit rating model and get customer credit rating.Users are classified according to user rating.After comparing with the actual data, it is found that the classification results in this paper are consistent with the actual.The customer credit rating model based on the bank credit card is constructed in this paper.It makes the bank refine the customer classification through the form of generating the rating, which reflects the intuitiveness of the result for the user classification and has certain warning and reference value for the bank's credit business.

7 ) 8 )
of all characteristics of a customer is obtained by sorting out:  = () *  +  = (∑   *    =1 +  0 ) *  +  = ∑   *    =1 *  +  0 *  +  ( =  0 *  +  (After calculation, factor = 28.85,offset = 113.56,base core = 154.The larger the score is, the higher the customer's credit rating is.According to the credit scoring model, the total score of customers in this paper is within the range of [0200].After the study of customer scores, it is decided to divide customers into class I [0,40), class II [40,80), class III [80,120), class IV [120,160) and class V [160,200] customers by using equidistant segmentation, and the customer's trustworthiness gradually decreases.The histogram of overall customer classification is shown(see Fig.6).It can be seen from the classification that category I has the most customers and category V has the least customers, indicating that most customers have high credit value.

Table 1 .
Weight of Evidence.

Table 2 .
Calculation formula of entropy.

Table 3 .
Data feature description.

Table 5 .
Information divergence of 23 features.It can be seen from the table that the information divergence of the bill amount in September of  12 (BILL_AMT1) is the largest, which is the optimal feature.According to the number of features in  1 ,  1 ,  1 and  1 feature systems, four sets of feature systems are selected according to the information divergence.Four groups of characteristic systems are respectively recorded as  2 ,  2 ,  2 ,  2 . 2 selects features of the same size as  1 , and removes 14 features with small information gain.The selected feature types account for 39% of the total features.Therefore,  2 includes features  20 ,  19 ,  18 ,  17 ,  16 ,  15 ,  14 ,  13 ,  12 . 2 selects the number of features of the same size as  1 , removes 11 feature variables, and the selected feature types account for 52% of the total number of features.Therefore,  2 includes 21 ,  23 ,  20 ,  19 ,  18 ,  17 ,  16 ,  15 ,  14 ,  13 ,  12 . 2 selects the number of features of the same size as  1 , and removes 12 feature variables.The selected feature types account for 47% of the total number of features. 2 includes features  23 ,  20 ,  19 ,  18 ,  17 ,  16 ,  15 ,  14 ,  13 ,  12 .D2 selects feature numbers of the same size as  1 , and removes 9 feature variables.The selected feature types account for 60% of the total features. 2 includes features  6 ,  22 ,  21 ,  23 ,  20 ,  19 ,  18 ,  17 ,  16 ,  15 ,  14 ,  13 ,  12 .

Table 7 .
Proportion of good and bad customers in 5 types of customers.