Driver Fatigue Detection Using Multitask Cascaded Convolutional Networks

. Driving fatigue is one of the main reasons of traffic accidents. In this paper, we apply the multitask cascaded convolutional networks to face detection and alignment in order to ensure the accuracy and real-time of the algorithm. Afterwards another convolution neural network (CNN) is used for eye state recognition. Finally, we calculate the percentage of eyelid closure (PERCLOS) to detect the fatigue. The experimental results show that the proposed method has high recognition accuracy of eye state and can detect the fatigue effectively in real-time.


Introduction
Along with the development of the auto industry and the transportation industry, traffic accidents have caused great loss in the property and damage to the society. Amongst these traffic accidents more than 20% of these traffic accidents are caused by fatigue driving. Safe driving has become a hot issue in today's society, Therefore, it is of great significance to develop a real-time and accurate fatigue detection system to send fatigue warning information when the driver is tired, which can effectively reduce the occurrence of traffic accidents. At present, fatigue testing contains three main directions. First, fatigue detection based on the vehicle state detection method, mainly through the turning angle, vehicle driving speed to detect whether the driver fatigue, this method is subject to external interference, the detection accuracy has a greater impact. Second, based on driver's physiological information [7], mainly by detecting the driver's heart rate, pulse and other physiological signals to determine whether the driver is in a state of fatigue, This method requires the driver to carry a lot of testing equipment, very cumbersome, and the driver has a great interference. Third, fatigue detection methods based on computer vision [6] [8][9][10], this method is a non-intrusive way, the facial features can be calculated by analyzing the changes of facial expression, such as eye closure duration, yawning and so on.
In the fatigue detection, driver face detection and alignment are important. The multitask cascaded convolutional networks to face detection and alignment [1] has proven to be an effective method. Another very important step is the detection of human eye state. Compared to the traditional active infrared radiation method [2], normal camera image employs a safer passive way. To detect the state of eyes, There are many methods, such as AdaBoost classifier [3], SVM classifier [4] and so on. However, their ability of expressing features is limited. Recently, convolutional neural network (CNN) achieve remarkable progresses in a variety of computer vision tasks. In our paper, we design a driver fatigue detection system using multitask cascaded convolutional networks. As shown in Figure 1, the method mainly includes five parts: Joint face detection and alignment using multitask cascaded convolutional networks, normalize the current image and ground truth shape according to the scaled mean shape, extract the area of eye, state of eye recognition, fatigue detection.

Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks
Fatigue detection system should have high recognition accuracy and can detect the fatigue effectively in real-time. How to quickly and accurately detect the face of the driver and the eye alignment and overcome the impact of a certain light are the difficulities of fatigue detection system. Kaipeng et al [1]. propose a new cascaded CNNs-based framework for joint face detection and alignment, and carefully design lightweight CNN architecture for real-time performance. The overall pipeline is shown in Figure 2, which is the input of the following three-stagecascaded framework.  Stage 1: Exploit a fully convolutional network, called proposal network (P-Net), to obtain the candidate facial windows and their bounding box regression vectors. Then candidates are calibrated based on the estimated bounding box regression vectors. After that, employ nonmaximum suppression (NMS) to merge highly overlapped candidates.
Stage 2: All candidates are fed to another CNN, called refine network (R-Net), which further rejects a large number of false candidates, performs calibration with bounding box regression, and conducts NMS.
Stage 3: This stage is similar to the second stage, but in this stage we aim to identify face regions with more supervision. In particular, the network will output five facial landmarks' positions.

Face normalization
In order to accurately extract the eye areas, we need to calculate the average face .Then normalize the current image and ground truth shape according to the scaled mean shape, this process is 2D affine transformation .The 2D affine transformation is a method used to change the rotation angle, the scale, and the location of a shape. The transformation can be represented as equation (1).
x ax by c y dx ey f Where is the coordinate of the th feature point on the average face, For convenience, equation (2) can be rewritten as equation (3).
Where is the feature point matrix of the average face, is the feature point matrix of the detected face.
is affine transformation matrix. It can be Calculated with least squares solution. Then, the solution of can be obtained as equation (4).
Normalize the current image and ground truth shape. According to the scaled mean shape aimed at change the detected faces' rotation angle, the scale, and the location of a shape. As shown in Figure 3.

Eye area extraction
In this paper, we extract the area of eyes based on the facial landmarks after normalization as shown in Figure 4 .The eye area has a size of 32×32.

Eye state recognition
CNN expresses features more better, avoiding the manual feature selection. So we used convolutional neural network to detect the state of eyes.

Convolutional neural network
To

Activation Functions
Sigmoid function and tanh function are commonly used non-linear activation functions, but these functions exist the gradient vanishing, So we use the ReLU function (Rectified linear unit) which is defined as equation (5).
ReLU can effectively alleviate the problem of gradient vanishing, So as to train the deep neural network directly in a supervised manner. The network can get sparse expression after the ReLU function, with the advantage of unilateral suppression.

Fatigue detection based on PERCLOS
After eye area extraction, the next step is to detect driver fatigue based on PERCLOS (percentage of eyelid closure over the pupil over time). PERCLOS is an established parameter to detect the level of drowsiness. Level of drowsiness can be judged based on the PERCLOS threshold value, PERCLOS is a parameter that is used to detect driver fatigue [5]. It is calculated as (6).
Let be the number of eye-close frames over a period time. is the total number of frames over a period time. When the driver is in a state of fatigue, the driver's PERCLOS value will be higher than normal. We set the PERCLOS threshold, when the driver's PERCLOS value is higher than this threshold, then the current driver is considered fatigue.

Train
In order to overcome the influence of light on image, the training data must contain data for different light intensities to enhance the robustness of the network, as shown in Figure 6. Since we perform eye state recognition, here we use the following two different kinds of data annotation in our training process: (1) negatives: 36×36 sample area was randomly intercepted near the eye area, regions whose the intersection-over-union (IoU) ratio is less than 0.4 to any ground-truth eyes as shown in Figure 7.

Training results
We select images including eye images of open and closed as positives samples, and randomly crop several patches to collect negatives samples. We select 120000 images as training samples. The eye state recognition rate of the network has an increase in the number of iterations when training the samples, the result is shown in Figure 9. With the increase of the iteration number, the accuracy rate gradually increased, the final accuracy rate between 0.995 to 0.996 fluctuations. In order to test the performance of the network, we collected three sections of video data, respectively, the accuracy rate shown in the Table 1. Through statistical 5 test videos includes 1239 frames of 320 * 240 images, computing the average time-consuming of the method include each module and overall time. Table 2 is the time-consuming result. The method complies with the requirement of real-time.

Fatigue detection based on PERCLOS
When the driver is in a state of fatigue, the driver's PERCLOS value will be higher than normal, by setting the PERCLOS threshold, when the driver's PERCLOS value is higher than this threshold, then the current driver is considered fatigue. In this paper, the PERCLOS threshold is set to 0.30, when the driver is fatigue, the PERCLOS value is bigger than 0.30, Figure 10 shows PERCLOS result.  Figure 11 shows the Sample images of detection results.

Conclusion
In this paper we propose a driver fatigue detection system. This system uses the multitask cascaded convolutional networks to face detection and alignment. And then use another convolution neural network (CNN) for eye state recognition. Finally we calculate the percentage of eyelid closure (PERCLOS) to detect the fatigue. The method of eye state recognition provides high accuracy and can detect the fatigue effectively in real-time. Tests show that the system implementation is successful and the system does indeed infer fatigue reliably.