Convolutional Neural Networks Optimized by Logistic Regression Model

. In recent years, convolutional neural networks have been widely used, especially in the field of large scale image processing. This paper mainly introduces the application of two kinds of logistic regression classifier in the convolutional neural network. The first classifier is a logistic regression classifier, which is a classifier for two classification problems, but it can also be used for multi-classification problems. The second kind of classifier is a multi-classification logistic regression classifier, also known as softmax regression classifier. Two kinds of classifiers have achieved good results in MNIST hand-written digit recognition.


Introduction
In recent years, since the convolutional neural network is proposed [1], it is widely used in pattern recognition [5], image processing, especially achieved good results in the large field of image processing [6].This paper [9] makes a detailed theoretical analysis of the convolution neural network, then the various classification algorithms and models have been proposed. In the paper [4] proposed the multilayer perceptron as a convolutional neural network classifier, the paper [4] also used k nearest neighbor algorithm as a convolutional neural network classifier, the paper [13] used support vector machine (SVM) as the convolutional neural network classifier, both of them have achieved good results in the handwritten numeral recognition experiment. The paper [12] mainly introduces the model of linear regression, logistic regression and so on. The paper [15] introduces the softmax regression model, and the detailed formula derivation of the algorithm combined with the back propagation algorithm. The structure of this paper is divided into five parts. The second part mainly introduces the convolutional neural network structure. The third part mainly introduces two kinds of classification model. The first is a logistic regression model and how to use logistic regression to solve multi classification problem, on the other is the Soft regression model. The forth part of the thesis is the experiment and the result analysis. The last part of the paper is the summary.

Structure of convolutional neural network
Convolutional neural network can be functionally divided into two parts, one is the image feature extraction section, the other part of the classifier. In our experiment, convolutional neural network in the structure can be divided into seven layers including an input layer, convolutional layer C1, sub-sampling layer S2, convolutional layer C3, sub-sampling layer S4, sub-sampling layer S4 unfolded layer (not included in the number of layers), fully connected layer and output layer. The feature extraction part includes C1, S2, C3, S4 layer, the classifier part includes the sub-sampling layer S4 expansion layer (also can be used as the input layer of the classifier), fully connected layer and output layer.
Convolutional neural network structure has many variants, this article uses a classic convolution neural network. In this paper, we mainly introduce the application of two kinds of logistic regression classifiers in the convolutional neural network, so we simply introduce the convolution neural network. We can more detailed understanding of convolutional neural networks in this classic paper [9]. Figure 1 is the structure of the convolution neural network used in this paper. This paper carries out a handwritten numeral recognition experiment on the MNIST data set.

Classifier model based on logistic regression
Logistic regression is a kind of classifier for two classification problems. If it is used for multi classification problem, we need to train a logistic regression classifier for each category. The Softmax regression classifier is applied to solve the multi classification problem. The image feature extracted from the convolutional neural network is used as the input layer of the multi-layer neural network, and the classifier is used to classify the image, and the back propagation algorithm is used to update the weight parameters in the whole training process. Next, we will introduce these two kinds of classifiers in detail.

Logistic Regression Model
The two classification problem is divided into 1 and 2 two categories, then the output layer has only one neuron and its output is y. Assuming that the number of weight parameters of the output layer is n, then w = [ 1 , 2 , … , ]. Since the output value y is processed by the sigmoid function, and the output range of the sigmoid function is 0 to 1, then we can put the output value y as the probability value which the input vector 1 belongs to the first category 1 . Conversely, using one minus the output y gives the probability value that the input vector x 1 belongs to the second category 2 .
For the two classification problem with N training samples, the cost function E can be expressed as: If we use logistic regression to solve the multi classification problem, we need to train a logistic regression classifier for each category. For the K classification problem, then the output layer has K neural units, the use of logistic regression to solve the problem of K classification need to train K logical regression classifier. For a specific category, the rest of the category as a category, then it becomes a two classification problem. For the K classification problem with N training samples, the cost function E can be expressed as: In order to prevent over fitting in the training process of neural network, we can add the regularization in the cost function. Regularization controls the model's complexity and makes the model more generalizable to unseen data. In order to simplify the problem, it is assumed that the neural network model is only the input layer and the output layer. We assume that the number of neural units in the input layer is M, while the output layer has K neural units. The weights between the input layer and the output layer are w, then the cost function is as follows: In the above model, we only add the L2 regularization. The represents the connection weights of neural unit m and neural unit k.

Softmax Regression Model
We need to use the softmax function instead of the activation function in solving the K classification problem and obtain the class probabilities: This is called as a multinomial logit model。For the target vector z to meet the relationship ∑ = 1

=1
.Assuming K = 5 , if the correct classification of a training sample is 3 , then the target vector should be ( ) = [0,0,1,0,0] . Then the probability formula of the input vector 1 and the target output vector z is: For the multi classification problem with N training samples, the cost function of the Softmax regression model is: Because the model has the redundant weight parameters, we need to add a weight decay term which penalizes large values of parameters to modify the cost function, our cost function is now:

Back propagation
The "errors" which we propagate backwards through the network can be thought of as "sensitivities" of each unit with respect to perturbations of the bias [16]. That is to say, As a result of ∂u ∂b = 1, the sensitivity of the output layer can be obtained.
By deriving the formula, we can obtain the sensitivity of the previous layer as follows: Due to the use of the back propagation algorithm, we also need to know the gradient in the process of updating the weight parameters. The gradient formula is as follows:

Experiment and Result analysis
We have made a good effect on the handwritten digit recognition based on MNIST data set with two kinds of logic regression classifier combined with convolutional neural network. We use three kinds of classifiers to do the contrast experiment, one is based on the logistic regression classifier, one is the softmax regression classifier, and the last one is the multilayer perceptron. We find that whether the classifier uses the hidden layer has a great influence on the convergence speed and the test results. The experimental results of the classifier without using the hidden layer are as follows: In this experiment, there are 60000 training samples, each of the 50 samples to update a weight parameter, then an iteration will update the weight 1200 times.

Conclusions
In this thesis, we mainly study the use of the classifier based on logistic regression in the convolutional neural network, and introduce two kinds of classifier models based on logistic regression. There are many factors that affect the classifier performance, such as whether to use the hidden layer, the number of hidden layer neurons, and the