Boltzmann Machine and its Applications in Image Recognition

. The overfitting problems commonly exist in neural networks and RBM models. In order to alleviate the overfitting problem, lots of research has been done. This paper built Weight uncertainty RBM model based on maximum likelihood estimation. And in the experimental section, this paper verified the effectiveness of the Weight uncertainty Deep Belief Network and the Weight uncertainty Deep Boltzmann Machine. In order to improve the images recognition ability, we introduce the spike-and-slab RBM (ssRBM) to our Weight uncertainty RBM and then build the Weight uncertainty spike-and-slab Deep Boltzmann Machine (wssDBM). The experiments showed that, the Weight uncertainty RBM, Weight uncertainty DBN and Weight uncertainty DBM were effective compared with the dropout method. At last, we validate the effectiveness of wssDBM in experimental section.


Introduction
The RBM is an unsupervised learning model which produces another expression of input data [1]. There are lots of training algorithms for RBM, such as Contrastive Divergence algorithm (CD), Persistent Markov chains and Mean Field methods, etc. In order to make full use of the features that extracted by RBM, Hinton et al built the DBN model [2][3][4]. The DBN model provides a feasible method to train Multilayer Perceptron by the process of unsupervised pre-training. Another classic model in deep learning field is the Deep Boltzmann Machine (DBM). DBM is powerful in image recognition and image reconstruction [5]. And there are many other powerful models in deep learning field [6][7][8]. The Extreme Learning Machine (ELM) and Multilayer Extreme Learning Machine performed well in classification problem [9]. In the field of image recognition, lots of research has been done as well [10][11][12].
Overfitting is a common problem in neural networks. To address this question, lots of algorithms are proposed. Dropout method is used to alleviate the overfitting problem, which can be used in training RBM as well [13]. However, according to our experiments, the Dropout RBM is not good at image reconstruction, although it is powerful in image recognition. The Weight uncertainty method is also widely used in neural networks to alleviate the overfitting problems [14]. In this paper, the weight random variables are used in training RBM to alleviate the overfitting problems. In our experimental part, we validate the learning ability of Weight uncertainty RBM model. In classic RBM models, the conditional probabilities of visible units are binary. In Gaussian-binary RBM (mRBM) [15], the conditional probabilities of visible units follow Gaussian distribution. However, the mRBM performs not well in modeling nature images. In order to improve the images recognition ability, we introduce the spike-and-slab RBM (ssRBM) to our Weight uncertainty RBM and then build the Weight uncertainty spike-and-slab deep Boltzmann machine (wssDBM). At last, we validate the effectiveness of wssDBM in experimental section.
a is the bias vector of the visible layer, b is the bias vector of the hidden layer, W is the weight matrix between visible units and hidden units, v is the visible layer vector, h is the hidden layer vector. Then, the probability based on ( , ) E v h is shown as formula 2: According to the whole training set, the likelihood function is defined as: ns is the number of samples. And there are many algorithms can be used to maximize the likelihood function, such as Stochastic Gradient Descent algorithm. Let , the derivative of the likelihood function is shown as formula 5:  is the parameter. And the conditional probabilities are shown as follows: Hinton et al. proposed Contrastive Divergence (CD) algorithm to approximate the Maximum Likelihood Estimation. Based on single sample, and k is the number of steps in K-steps Contrastive Divergence algorithm (CD-K). We update the weights between visible units and hidden units with the following formulas:  is the learning rate.

Spike-and-slab Restricted Boltzmann Machine
In order to model the expectation and covariance of Gaussian distribution, ssRBM model is proposed. In ssRBM, a variable slab is used to express the density. Based on the variable slab, the conditional probability of visible units has a diagonal covariance matrix. And the block Gibbs sampling can be used in ssRBM. The energy function can be expressed as follow: Beyond this energy function, the conditional probability can be expressed as follows: (15) In this way, the conditional probability of visible units is a diagonal covariance matrix.

The training algorithms about RBM and Boltzmann Machine
There are lots of training algorithms for RBM model. Early, Persistent Markov chains and the Simulated Annealing method were used to estimate the data independent expectation and data dependent expectation. Although CD algorithm is not accurate in learning step-size, it guarantees the correct gradient direction. Based on CD algorithm, Persistent Contrastive Divergence algorithm (PCD) and Persistent Contrastive Divergence algorithm with Fast weights (FPCD) are proposed. In order to decrease the sampling time in training process, Mean Field Method is proposed.

Mean Field Method
The detailed Mean Field Method is shown in reference. In the probabilistic graphical models, the real posterior distribution   |; What we need to do is minimizing the following KL Divergence: For the Mean Filed Boltzmann Machine, we have: Then, the KL Divergence can be expressed as follow: The probability values of the hidden unit can be expressed as:

Persistent Markov chain
The detailed Persistent Markov chain algorithm is shown in reference [16,17]. If the Markov chain is long, and the step-size is not too large, the Markov chain will reach the steady state. The Persistent Markov chains can be used in training Boltzmann Machines as well. For the data independence expectation, we can obtain an effective approximation. The algorithm of Persistent Markov chains is shown in End for.  is a set of parameters,  is sufficient statistics vector, t  is the learning rate.

Deep Belief Networks
DBN is a hybrid network, which is proposed by Hinton in 2006. The top 2 layers consist of an associative memory with undirected connections. And the layers below have directed, top-down generative connections. In training process of DBN, the network is initialized layer by layer. Suppose that DBN is a model which has infinite layers. Then we use the same weight W0 to initialize the network, the model can be considered as RBM in the training process, which is shown in Fig 2 (a). After training the first layer of DBN, the weights of the first layer remain constant, and the other weights are replaced by W1. In this case, the priori information will be updated layer by layer. Hinton et al. proved that the pre-training process can tighten the variable boundary: , and the pre-training process of DBN is shown as

Deep Boltzmann Machine
Different from DBN, DBM is still a Boltzmann Machine in topology. In the training process of DBM, the activation of each unit depends on the units in the adjacent layers. Salakhutdinov pointed out that the training process of DBM model can also be carried out layer by layer. However, different from DBN, Salakhutdinov showed that the different effects can be obtained by replacing the priori information with different proportions.
The probabilities in DBM model can be expressed as follows: The superscripts represent the layer number. The log-likelihood can be approximated by using Stochastic Approximation algorithm and Mean Field Algorithm.

Weight uncertainty method
In the whole training process, the weights and biases need to be calculated. And the weights and biases are regarded as real valued variables. In this case, training a neural network prefer to encounter the problem of overfitting. There are lots of research about alleviating the overfitting problem in neural networks. Based on RBMs, the main algorithm is dropout method. Although dropout RBM is useful to alleviate the overfitting problem in classification, the image reconstruction ability of dropout RBM is not better than conventional RBM. If the weights are considered as random variables, the above problems may be alleviated. The weights are considered as random variables, and we assume that the random variables follow Gaussian distribution. What we need to do is calculating the expectation and the covariance. And the generations of different weights can be regarded as the sampling from Gaussian distribution. Therefore, the Weight uncertainty neural network can be considered as the ensemble of neural networks.
In the research of Blundell et.al, all weights in networks are regarded as probability distributions, rather than a real value. The objective is to find a variational approximation to the Bayesian posterior distribution on the weights. And the objective function can be expressed as follows: According to the thought of MAP estimation，let According to the chain rule, the derivatives can be expressed as follows: In the experimental section, we test the classification ability and image reconstruction ability of Weight uncertainty RBM model (WRBM), and then build the DBN and DBM based on WRBM.

Weight uncertainty spike-and-slab deep Boltzmann Machine
ssRBM is used to model nature images. In this paper, we use the ssRBM as the feature extractor, and build the DBM model, and then we introduce the weight random variables to the DBM, and build the wssDBM. At last, we validate the effectiveness of wssDBM in experimental section.

Experimental analysis
Firstly we compare the WRBM with RBM and dropout RBM in classification and image reconstruction. The algorithm we used in fine-tuning process is the conjugate gradient algorithm, the iterative steps are 100. In this experiment, we use MNIST, MNIST-Basic and Rectangles as the testing data sets. The attributes of these data sets are shown in Table 2: Firstly we test the image recognition ability of the WRBM (WRBM). In finetuning process. The testing accuracies are shown in Table 3: Table 3. The number of misclassifications in shallow models As we can see from Table 3, the classification accuracies of WRBM are better than RBM and dropout RBM, that is to say, like dropout method, the weight random variables are useful in classification problems.

MNIST-Basic
The reconstruction errors in training process are shown in Table 4: As we can see from Table 4, the image reconstruction ability of WRBM is better than other models. And the weight random variables are also useful in image reconstruction.
The topologies in DBM and Weight uncertainty DBM (WDBM) are 784-1000-1000-10. And the topologies in DBN and Weight uncertainty DBN (WDBN) are 784-1000-2000-10. The iterative steps in RBM training process are 200. The iterative steps in DBM training process is 300. The testing accuracies are shown in Table 5. Table 5. The number of misclassifications of DBN and DBM