Comparison of Machine Learning and Deep Learning Approaches for Decoding Brain Computer Interface: An fNIRS Study

. Recently, deep learning has gained great attention in decoding the neuro-physiological signal. However, which one (classical machine learning or deep learning) has better performance for decoding the functional near-infrared spectroscopy (fNIRS) signal is still lack of full verification. Thus, in this paper, we systematically compared the performance of many classical machine learning methods and deep learning methods in fNIRS data processing for decoding the mental arithmetic task. The classical machine learning methods such as decision tree, linear discriminant analysis (LDA), support vector machine (SVM), K-Nearest Neighbor (KNN) and ensemble methods with strict feature extraction and screening, were used for performance comparison, while the long short-term memory-fully convolutional network (LSTM-FCN) method as a representative of deep leaning methods was applied. Results showed that the classification performance of SVM was the best among the classical machine learning methods, achieving that the average accuracy of the subject-related/unrelated were 91.0% and 83.0%, respectively. Furthermore, the classification accuracy of deep learning was significantly better than that of the involved classical machine learning methods, where the accuracy of deep learning could reach 95.3% with subject-related condition and 97.1% with subject-unrelated condition, respectively. Thus, this paper has totally showed the excellent performance of LSTM-FCN as a representative of deep learning in decoding brain signal from fNIRS dataset, which has outperformed many classical machine learning methods.


Introduction
The brain computer interface (BCI) is a technology that provides communication for the human or animal brain with the external environment [1].When the brain performs a functional task, it activates (or suppresses) the functional brain-related regions, directly affecting the regional cerebral blood flow (rCBF) and cerebral blood volume (CBV).Changes occur and eventually manifest as rapid changes (elevation or decline) in blood oxygen levels in the corresponding regions of the brain, which is called neurovascular coupling [2].Cerebral nerve activity causes the corresponding changes in blood oxygen levels, and the consequent changes in blood oxygen levels can impact the magnetic and optical properties of brain tissue.It is well known that the water, oxygenated hemoglobin (oxy-Hb), and deoxygenated hemoglobin (deoxy-Hb) have different absorption coefficients for near-infrared light of different wavelengths [3,4,5].According to the neurovascular coupling, when the functional cognitive neural activity tasks are performed, the oxygen demand in the active area under the functional task is increased, with the perfused cerebral blood flow.In functional cognitive tasks, the activated brain region generally showed an increase in the concentration of oxy-Hb and total hemoglobin (t-Hb), a decrease in the concentration of deoxy-Hb.The well-known functional magnetic resonance imaging technique generates the blood-oxygen-level-dependent (BOLD) signals by magnetic changes caused by changes in hemoglobin concentration [6], while fNIRS measures oxy-Hb and deoxy-Hb by the absorption of near-infrared light at wavelengths around 704 nm and 887 nm [3,7,8].
In 2007, Coyle [9], Sitaram [10], Naito [11] has demonstrated the feasibility of controlling the output of fNIRS-BCI.Today, researchers can identify motor execution, motor imagery, metal arithmetic, music imagery with fNIRS-BCI [12].Although deep learning has become more and more popular in signal processing, deep learning has not attracted enough attention from fNIRS-BCI researchers in recent years, which is possibly due to the limitation of relatively small samples in fNIRS-BCI experiments.For example, in 2015, Johannes Hennrich et al. has compared deep neural network with part of classical machine learning methods and they showed that deep neural network did not yield higher classification rates than the shrinkage LDA [13].However, whether classical machine learning or deep learning has better performance for decoding the brain signal is still lack of full verification.Thus, in this paper, a public fNIRS-BCI mental arithmetic data was used to aim at completely finding out the which one (classical machine learning or deep learning) could perform better in fNIRS-BCI data processing under the brain decodin'g task.In order to improve the performance of machine learning, a new feature screening method is used to find out positive channels and time period of fNIRS mental arithmetic dataset.

Experimental Design and Dataset
In this paper, a public fNIRS-BCI dataset was involved in performance validation, which was a mental arithmetic dataset of prefrontal and temporal lobes, with a total of eight subjects, collected and published by the Neuroengineering Laboratory of Graz University [14,15].The designed experiment paradigm and data recording were detailed in the following sub-sections.

Experimental paradigm
During the designed experiment, all eight subjects were firstly asked to keep an eye on the computer screen.The computer screen was black before the task was activated.Before the mental arithmetic task started, the green line appeared on the screen and lasted for two seconds.When the mental arithmetic task started, the mental arithmetic task prompt (e.g.97-4) appeared on the screen for one second.The subject needed to follow the prompts to calculate the mental arithmetic task.Namely, the subject should calculate the 97 minus 4 task (97, 93, 89, 85...) until the green line in the middle of the screen disappeared.Specifically, the green line appeared for 14 seconds in each trial.The first two seconds prompt that the trial was about to start.After the green prompt line appeared for 2 seconds, the mental arithmetic task calculation formula appeared above the green prompt line for 1 second (e.g., 97-4).Then, the subject was asked to watch the screen and started mental arithmetic until the green line disappeared (the mental arithmetic task lasted for 12 seconds).After the green line disappeared, the subject continued to watch the black screen and to wait for the next trial (as a resting trial without mental arithmetic task).

Data recording
A continuous wave system (ETG-400, Hitachi Medical Co, Japan) was used in the experimental instrument.Using 16 photo-detectors and 17 light emitters to form a 3×11 grid probe arrangement, 52 channels in total were distributed on the prefrontal and temporal lobes, and each one was capable of measuring oxy-Hb concentration, deoxy-Hb concentration and t-Hb concentration.The sampling rate was 10 Hz and the distance between the source and the probe was 3 cm.The channel closest to the nose side was arranged along the FP1-FP2 series of the International.The channel 48 in the electroencephalogram 10-20 system was located at the FP1 position (as shown in Fig. 1) .

Data Preprocessing
It well known that there is a physiological interference signal in the captured fNIRS data.In order to obtain a brain functional activation signal with high signal-to-noise ratio (SNR), it is necessary to preprocess the original data to reduce physiological noises caused by physiological activities such as heartbeat, respiration, and blood pressure fluctuations [16].Physiological noise generated by heartbeat, respiration, blood pressure, and Mayer waves are relatively stable and statistically independent interference signal [17].The most common method for dealing with physiological noise is to use a band-pass filter that uses a digital filter to eliminate the effects of heartbeat, respiration, Mayer wave signals, and baseline drift based on the frequency of physiological noise.In our experiment, the finite impulse response (FIR) band-pass filter is used to pass the 0.05~0.7 Hz signal, aiming at improving the SNR of the signal.

Feature Extraction
The fNIRS-BCI data refers to the time-series signal of the multi-channel scattered light intensity change collected by the fNIRS device.Statistical features directly reflect the statistical characteristics of fNIRS data and reduce the redundant information, which is widely used in fNIRS data processing [18].In the past few years, the main feature extraction method of fNIRS-BCI data in classical machine learning methods is the average concentration of the time-intercept signal intercepted in seconds [19,20,21,22] and slope [21,22].
In the feature extraction stage, the average and slope of the 0.5 second window and 1 second window, the average of the full data segment, the linear fitting regression value, the variance, the range, and the skewness were selected as the classification features.Each subject dataset could provide 125 features, including the average and linear regression of the time window, and the average of the entire data segment, linear regression, variance, range, and skewness.

Feature Screening and Classification
After the stages of data preprocessing and feature extraction, the feature data with the size of 36×156×125 was first obtained.After the feature data was converted into two dimensions, a matrix with size of 36×19500 was obtained.The channel screening and feature screening procedure was performed to remove redundant information, and to determine the main activation channel (brain region) of the mental arithmetic task and the main change time of cerebral blood flow.The classification contribution values of each single-channel with all features data 36×1×125 and the single-feature with all channels data 36×156×1 were evaluated, respectively.The classification contribution values were then selected as follows: the input feature data was classified by five classifiers such as LDA, SVM, KNN, Decision Tree and Ensemble Learning classifier, and then the average value of the top three accuracy was used as a criterion.
Then, based on the channel classification contribution values and the feature classification contribution values, the optimal combination of the multiple channels and multiple features should be determined.In this paper, the channels and features combinations of multi-channel and multi-features were classified and judged by the exhaustive method with different number of optimal channels and optimal features.Although the number of channels and the number of features was simply changed in the feature screening process, the corresponding calculation time could be greatly reduced in searching optimal combinations.The approximate activation channel (in spatial domain) and approximate activation period (in temporal domain) of the mental arithmetic task were obtained by optimal combination through a large number of calculations.The detailed procedure was shown in Fig. 2.
Finally, after the procedure of feature screening, the chosen features were put into 5 classifiers (i.e., LDA, SVM, KNN, Decision Tree, and Ensemble Learning classifier) to perform the classification task.

Deep Learning for fNIRS Signal Decoding
In this paper, a long time-term memory (LSTM) based on a fully convolutional network (FCN) [23], namely LSTM-FCN was used.FCN replaced the last three fully connected layers of the classic convolutional neural network (CNN) [24] with three convolutional layers.
The original fNIRS data of 8 subjects were processed according to the data interception method, generating the data with dimensions of 36×156×200, where 36 denoted the number of experimental trials, 156 represented the number of channels, and 200 was the length of a trial.Input data is randomly assigned in a 4 to 1 ratio to obtain the training set and test set, respectively.The followed experimental results were averaged from the results of 5 cross-validation.
The 1-D convolution has proven to be an effective learning method for time series classification problems [24].An FCN with 1-D convolution was commonly used as a feature extractor.Global average pooling [25] was applied to reduce the number of parameters in the model before classification.In the proposed model, the FCN was enhanced by the LSTM module, and then the dropout function was involved to prevent over-fitting and to accelerate the training process [26].The FCN module consisted of three stacked 1-D convolution blocks with filter sizes of 128, 256 and 128, respectively.Each convolution block was identical to the convolution block in the FCN architecture proposed by Wang [24].Each block consisted of a temporal convolutional layer, each block was processed for bulk normalization [27], and then the activation function was used.The time series data was input into the LSTM block, and then the dropout function was called.The output of the FCN module and the output of the LSTM were concatenated.The architecture of LSTM-FCN for fNIRS-BCI data was showed in Fig. 3.

Channel and Feature Screening in Classical Machine Learning
In the optimal channels determination procedure, the classical machine learning methods classified 156 channels and 125 features without prior conditions, and performed exhaustive tests on all channels to find the optimal channels.After the data of several subjects was processed, it can be determined that the activated brain area of the mental arithmetic task was more significant in channel 26, 36, 37, 46, 48, which was shown in Fig. 4.Moreover, it was found in the feature screening stage that the average value of the data in the interval [6s, 17s] contributed more to classification performance.Each subject processing could provide a priori condition for the next data processing, which could help determine the activation brain area and activation time period.1, the highest accuracy of SVM in the classical machine learning of the subject-related classifier was 91%, and the accuracy of the deep learning method was 95.3%.Under the subject-unrelated conditions, the accuracy of classical machine learning reached 83%, and the accuracy of deep learning reached 97.1%.For the S03, S04, S07, S08, in the case of different network dropout rate, can achieve 100% accuracy.At this point, deep learning method, i.e., LSTM-FCN, has been significantly better than the classical machine learning methods have, and a 100% accuracy could make BCI system more stable.
In the subject-related classification, accuracy of S05 and S06 was lower than that of other subjects.But in deep learning method, the accuracy of S05 and S06 can reach 90%.Contritely, the deep learning method (LSTM-FCN) still got better decoding performance for the difficult subjects in the classical machine learning.In the subject-unrelated classification, the deep learning method was with significantly higher accuracy than the classical machine learning methods did.The highest accuracy of deep learning was 14.1% higher than that of the classical machine learning methods where the accuracy of deep learning is very close to 100%.Thus, according to Table 1, it was easily concluded that the LSTM-FCN method as a representative of deep learning had better performance for fNIRS-BCI decoding.

Conclusion
In this paper, we made a full decoding performance comparison between the classical machine learning methods and deep learning method on fNIRS-BCI data.According to the experimental results, it was found that the LSTM-FCN method as a representative of deep learning showed superior performance to all the classical machine learning methods such as decision tree, LDA, SVM, KNN and ensemble classifier.In terms of the training time in the classical machine learning methods, it took more time to do channels screening and features screening task.As the amount of data increased and the new models were developed, the deep learning methods could achieve the classification accuracy and robustness on the small fNIRS-BCI dataset.Thus, the deep learning is a promising tool for decoding fNIRS-BCI data in comparison to the classical machine learning.

Fig. 2
Fig. 2 Flowchart of channel and feature screening

Fig. 4
Fig. 4 The activation channel map of mental arithmetic task

Table 1 .
Accuracy of classical machine learning methods and deep learning methodIn Table