A Deep Learning Approach Based on CSP for EEG Analysis

. Deep learning approaches have been used successfully in computer vision, natural language processing and speech processing. However, the number of studies that employ deep learning on brain-computer interface (BCI) based on electroencephalography (EEG) is very limited. In this paper, we pre-sent a deep learning approach for motor imagery (MI) EEG signal classification. We perform spatial projection using common spatial pattern (CSP) for the EEG signal and then temporal projection is applied to the spatially filtered signal. The signal is next fed to a single-layer neural network for classification. We apply backpropagation (BP) algorithm to fine-tune the parameters of the approach. The effectiveness of the proposed approach has been evaluated using datasets of BCI competition III and BCI competition IV.


Introduction
Brain-computer interface (BCI) is a communication system that is established between the human brain and computers or external devices without relying on the regular brain peripheral nerve and muscle systems [1]. BCI system acquire human brain EEG signals, extract features, classify EEG and translate EEG into machine-readable control commands. The main goal of BCI system is to strengthen the ability of disabled persons affected by a number of motor disabilities. The application of BCI in the medical field mainly includes sensory recovery, cognitive recovery, rehabilitation treatment, and brain-control wheelchairs [2]. In non-medical areas, BCI can be applied to new types of entertainment games, car driving, robot replacements, lie detectors [3], etc. In addition, in the field of aviation and military industry, BCI also has a wide range of applications.
In recent years, deep learning's revolutionary advances in audio and visual signals recognition have gained significant attentions. Some recent deep learning based EEG classification approaches have enhanced the recognition accuracy. In a study by An et al, a deep belief network (DBN) model was applied for two class MI classification and DBN was shown more successful than the SVM method [17]. Yousef et al applied convolutional neural networks (CNN) and stacked autoencoders (SAE) to classify EEG Motor Imagery signals [18,19]. Schirrmeister proposed a convolutional neural network (deep ConvNets) for end to end EEG analysis. Their study shows how to design and train ConvNets to decode task-related information from the raw EEG without handcrafted features and highlights the potential of deep ConvNets combined with advanced visualization techniques for EEG based brain mapping [20].
In this paper, we propose a framework based on CSP and backpropagation algorithm for MI-EEG analysis. In order to evaluate the proposed framework, we trained and tested with BCI competition II dataset III and BCI competition IV dataset 2a. The remainder of this paper is organized as follows. Section 2 provides a description of the proposed framework. Section 3 describes the experimental studies and results on the evaluation data of the BCI competition II datasets III and BCI competition IV datasets 2a. Finally, section 4 concludes this paper with the results.

Methods
The structure of the proposed framework is shown in Fig.1. The proposed framework consists of 4 stages. The first stage is a band-pass filter for raw EEG data. The second stage performs spatial filtering using CSP algorithm. The third stage consists of the temporal projection of the spatial filtered signal. The last stage is a single-layer neural network that is implemented as a classification layer. The following sections explain the different stages of the proposed framework in detail.

Band-pass filtering
As described in section 1, there are ERS/ERD when human perform MI tasks. In order to extract the EEG signals in mu band and beta band, the raw EEG data is first filtered by a band-pass filter that covers 8-30 Hz.

CSP algorithm
The CSP algorithm is highly successful in calculating spatial filters for detecting ERD/ERS. The main idea is to use a linear transform to project the multi-channel EEG data into low-dimensional spatial subspace with a projection matrix, of which each row consists of weights for channels [21]. This transformation can maximize the variance of two-class signal matrices. The CSP algorithm perform spatial filtering using where i E is an nt  matrix representing the raw EEG measurement data of the i th trial, n is the number of channels, t is the number of measurement samples per channel.
csp W denotes the CSP projection matrix, T denotes transpose operator. Z denotes the spatially filtered signal. The CSP matrix can be computed by solving the eigenvalue decomposition problem where 1 S and 2 S are estimates of the covariance matrices of the band-pass filtered EEG measurements of the respective motor imagery action, D is the diagonal matrix that contains the eigenvalues of 1 S .
However, only a small number m of the spatial filtered signal is generally used as features. We perform another transform to get the spatially filtered signal. It is given by where csp W represents the first m and the last m columns of csp W , the spatial filtered signal Z is a 2mt  matrix.

Joint optimization using backpropagation
Mathematically, the 3th stage and the 4th stage can be described as follows. Given the spatial filtered signal Z , the temporal projection matrix V , the classifier weights where S denotes the input that is a vector containing class scores and will be plugged into an activation function. The output of the framework is given by ( ) where y is a vector of probability for the classes and ( ) f  is the activation function that is the softmax function. The softmax function (sofmax regression) is a generalization of logistic regression to the case where we want to handle multiple classes. The softmax output is given by The free parameters of the 3th stage and the 4th stage are the temporal projection matrix V , the classifier weights c W and the bias b . The parameters are learned by using back-propagation algorithm. In this method, the labeled training set is fed to the network and the error E (cost function) is computed. Then the model parameter can be updated using gradient descent method. The error can be minimized by changing network parameters as shown as follows where  denotes the learning rate of the algorithm. V is initialized to a matrix of all ones, c W is randomly initialized from a Gaussian distribution. Finally, the trained framework is used for classification of the new samples in the test set.

Experiments with BCI competition datasets
In this section, we apply the proposed framework to the BCI competition datasets, and the results of the proposed approach on these datasets are presented.

BCI competition II, dataset III
The first dataset is dataset III from BCI competition II. The dataset includes MI task experiments for right hand and left hand movements. EEG signals are recorded at C3, Cz and C4 channels. During acquisition of the EEG signals, at t = 2s an acoustic stimulus indicating the beginning of the trial was used and a cross '+' was displayed for 1s. Then, at t = 3s, the subject was asked to perform the related MI task by displaying an arrow (left or right). There were 280 trials in the dataset, 140 trials for training and another 140 trials for test. For each EEG trial, we extracted the time interval between 0.5s to 3.5s after the cue was displayed. To evaluate our method on the dataset, we used the network shown in Fig.1 and described in section 2, which consists of a band-pass filter, CSP spatial projection, temporal projection and a single-layer neural network. The framework was trained with 140 trials in the training set and tested on 140 trials in the test set. Stochastic gradient descent (SGD) was used to update the parameters and minimize the error E . For each training epoch, the mini-batch was set to be 1/2 of the training data randomly.
The results of BCI competition II dataset III are shown in table 1. When learning rate  was fixed to be 0.03, we obtained the best results. The accuracy performance of our method was obtained as 90.0%. The accuracy of the winner algorithm of the competition is 89.3%. We compared our results to some study (CNN and CNN-SAE) where deep learning network is used [18,19]. The results of CNN and CNN-SAE are 90.0% and 89.3% respectively. The CSP-LR method is the normal method without using deep learning methods for MI-EEG analysis, which use CSP for feature extraction and logistic regression algorithm for classification. We also compared our results to the CSP-LR method. The CSP-LR method got an accuracy of 88.9%. The kappa values of those methods are also in the table 1. The kappa value is a measure for classification performance removing the effect of accuracy of random classification. Kappa is calculated as 1 11 acc N kappa N − = − (11) where N denotes the number of classes. In this dataset N is 2. As described in table   1, the accuracy of the proposed method is equal to CNN-SAE, and is better than the winner of competition, CNN method and CSP-LR.

BCI competition IV, dataset 2a
BCI competition IV dataset 2a comprised 4 classes of motor imagery EEG measurements from 9 subjects, namely, left hand, right hand, feet, and tongue. Two sessions, one for training and the other for evaluation, were recorded from each subject. Each session comprised 288 trials of data recorded with 22 EEG channels and 3 monopolar electrooculogram (EOG) channels. Each trial starts with a short acoustic stimulus and a fixation cross. Then, at t = 3s an arrow indicates the MI task. The arrow is displayed for 1.25s. Then the subjects have 4 s to imagine the task. There are 4 classes in dataset 2a that is different from BCI competition II dataset III. When performing the spatial projection, we use OVR-CSP [22] to get the spatial filtered signals. The architecture of framework described in section 2 can be changed as Fig.2. The number of temporal projection matrices needed to be fine-tuned increase to 4. The 4 temporal projection matrices are initialized to matrices of all ones and will be updated together using back propagation algorithm.

Fig. 2. Diagram of the proposed framework based on OVR-CSP
For each EEG trial, we extracted the time interval between 1s to 5s after the cue was displayed. The framework was trained with training data and tested on test data. SGD was used to update the parameters. The Mini-batch was set to be 1/4 of the training data randomly.
The accuracy results of the proposed method and CSP-LR are shown in table 2. Kappa values of the proposed method and CSP-LR are compared to FBCSP (winner algorithm of competition) [9] in table 3. With the deep learning method, the proposed method obtained higher accuracies and better kappa values than CSP-LR method for all subjects. For subject 1, subject 2, subject 3, subject 8 and subject 9, our approach has better kappa values than FBCSP. For subject 4, subject 5, subject 6 and subject 7, our approach has worse kappa values. The average kappa value of our approach is 0.583, which is higher than FBCSP (0.569).

Conclusion
In this study, we propose a deep learning approach for MI-EEG analysis. We designed a framework by combining backpropagation algorithm and CSP. We use a band-pass filter for processing the raw EEG data. And CSP algorithm is used for spatial filtering. Then we perform temporal projection and obtain the features which are fed to a single-layer neural network for classification. The free parameters of the framework can be fine-tuned by applying the backpropagation algorithm for the best classification accuracy. We apply the proposed framework to the BCI competition datasets. Dataset III from BCI competition II and dataset 2a from BCI competition IV were used in this study. The accuracy result of our method on dataset III is 90.0% that is equal to CNN-SAE method. And it is higher than the winner algorithm of competition II and CNN method. On dataset 2a from BCI competition IV, our method obtained average kappa value of 0.583 which is better than FBCSP. Furthermore, on both datasets our method outperformed CSP-LR method that is not using deep learning methods.
Though deep learning methods have achieved great development in computer vision, natural language processing and speech processing, its application in EEG-based BCI is still rare. Our results show that deep learning methods have great potential to be a powerful tool for EEG analysis and EEG-BCI. We believe that the number of further BCI studies using deep learning methods will increase rapidly.