Automatic Driving Decision Algorithm Based on Multi-dimensional Deep Space-Time Network

. A model of autopilot decision algorithm based on multidimensional depth space-time network was studied in this paper. The forward images of vehicle driving was taken by the camera mounted on the vehicle. The images and the steering wheel angle and speed were collected as the model training input data. The multi frame vehicle image was pre-processed, the underlying feature image and the original image were used as the input of the multi-dimensional space-time decision network. The multi-dimensional space-time decision network was set up. The multiple three-dimensional convolution paths were used to extract and fuse the high level spatiotemporal features of the original and the underlying features, and the fusion features were used. In the decision of autopilot. The multidimensional spatiotemporal network was trained by using the driver's driving data, and the multidimensional spatiotemporal decision-making model was obtained. The decision model of the autopilot makes use of multidimensional space-time information to directly output the decision information of autopilot. The model can effectively output the driver's decision data.


Introduction
Generally, The Automatic driving system consists of three modules: the environment perception module, the planning decision module and the vehicle control module.The environment perception module is used to obtain road and traffic environment information.The planning decision module calculates the driving track of vehicle on the road and the driving direction and speed of the vehicle on the each point of the road.
The main task of the vehicle control module is to realize the control message of the decision module.The control instructions are sent to the executing agency according to the current vehicle status.Control module, and finally realize the automatic driving task.
Planning decision module is the most important and most challenging task in an auto driving system.With the rise of artificial intelligence technology, more and more companies and researchers have applied artificial intelligence technology such as deep learning and reinforcement learning to the planning and decision of automatic driving system, and achieved remarkable results.Comma.AI uses a mobile phone to realize the initial attempt of auto driving system.The system obtains road and traffic information mainly through the mobile camera, and then outputs the decision information by running the depth network program to complete the auto driving task in the mobile phone.Mobileye's planning decision module includes two parts, the non-learning part and the learning part.The non-learning part uses a rule-based decision algorithm to realize vehicle trajectory planning and decision.The decision algorithm of learning part is trained by a deep reinforcement learning network.The algorithm is trained by a large number of driving data.The combination of the two decision algorithms constitutes the control strategy of Mobileye planning decision module.
Current decision algorithms about deep learning networks mostly only take into account the spatial information of the current time, and do not consider the time information contained in the dynamic process of vehicle driving.In this paper, the vehicle image shouted by the camera installed on vehicle, and the steering angle and speed of the steering wheel were collected as the training data of the deep learning network.The image is pre-processed to construct the multi-dimensional deep spacetime network together with running dates of the vehicle.The trained network was used to make decisions for the automatic vehicle.

Data collection and data processing
In this paper, the automatic driving decision algorithm needs to use the vehicle driving image as input.Vehicle driving image was collected to train the multidimensional space-time decision network，and also used to test the decision algorithm.A camera mounted on the vehicle was used to take photos of the front view of the vehicle.
One million and 200 thousand usable pictures were collected.We need to convert the original data into a format that is easy to read in the deep learning model.The 'h5py' format with high access efficiency was used to save the original data.Pictures and steering tag dates are saved in two 'h5py' files for easy reading.In order to speed up the model training, the original pictures were reduce to (640*320) and then saved.
D riving decisions dates of skilled driver (such as steering wheel angle and speed) were recorded to train multi-dimensional deep space-time network.Sampling frequency of pictures is less than the sampling frequency of driving decision dates.So the sampling frequency of driver decision dates were took equal to the frequency of pictures.In another way, the time stamps of the image and driving decision data is added.The driving decision data corresponding to the nearest picture is taken as the driving decision data.Driving decision data such as steering wheel angle and speed etc. are acquired by vehicle CAN bus or external sensors.

Feature extraction of dates
In order to get the information in the multi frame vehicle image better, we need to preprocess the image and obtain the low layer feature image.There were two kinds of lowlevel feature images in this study: gradient and optical flow.The gradient image was obtained by using the edge gradient operator, while the optical flow image was obtained by dense optical flow algorithm.Considering the image of each direction was different because of the image perspective deformation, the gradient image and the optical flow image were divided into two directions of X and y, and a total of 4 low layer features were obtained.Each frame of light flow needs two frames of original image to obtain, so if the vehicle driving image for each decision was T frame, the X and Y direction of the optical flow image was T-1 frame respectively, and the gradient image of X and Y direction was T frame respectively.
The multi-dimensional spatiotemporal decision network takes the original image and the low-level feature image as input, and extracts the spatiotemporal characteristics to make decision.Its structure was shown in Figure 1.The network uses multiple access networks to acquire high-level features, fuse these high-level features, and make decisions.For each input feature, a single path is used to extract high-level features.Each path has the same network structure, stacked by four three-dimensional convolution modules, as shown in Figure 2. The three dimensional convolution module was composed of three dimensional volume layer, batch normalization layer and activation layer, as shown in Figure 3.The three dimensional convolution layer in 3D convolution module is the extension of the commonly used image convolution in time dimension.The calculation formula is as follows: Convolution with a convolution core of P x Q x R, so that not only convolution in space, but also convolution at the same time, thus the time and space information contained in the multi frame vehicle image can be obtained at the same time.By setting the corresponding step size, the dimensions of each dimension can be changed to reduce the computational complexity.Although the three-dimensional convolution in the lower layer of the network can only obtain more local spatiotemporal information, the global spatiotemporal information can be gradually acquired through the stacking of multiple three dimensional coiling layers.
The volume normalization layer in 3D volume layer can speed up training, improve training stability, reduce over fitting, and improve the network effect to a certain extent.In training, the data for each batch of training are normalized.The formula is: In the formula,   ,   2 are the mean and variance of the batch training data for the batch normalization layer, respectively.When using a trained network to make decisions, because the data is not batch input, the mean and variance are as follows: That is to say, the mean value of all batches of training is the mean value at this time, and the unbiased estimate of variance of all batches during training is the variance at this time.
The parameter correction linear element was used in the activation layer, and its formula is: Such an activation layer avoids gradient disappearance and makes the network easier to converge and train.
When each channel is extracted to the high level feature respectively, these different high-level features need to be fused into a fusion spatiotemporal feature.The system and methods are fused with the full connection.The fusion formula is as follows: is the feature after fusion.i f and i W are the ith kind high level eigenvalue and their corresponding weight matrices respectively.Finally, the full connection is used to transform features into decision parameters such as steering wheel angle and speed.

Driver decision-making model training
Using the driving image as input, the driver's driving decision dates and the images corresponding to the dates were trained by the constructed multi-dimensional deep space-time network.The training framework is shown in Figure 4.The training uses the batch gradient descent algorithm.Each batch was input N samples, each input sample is multi frame vehicle driving image, and each input sample contains a frame number of T, and these samples and the corresponding pre-processed low layer feature images were sent into the multi-dimensional deep space-time decision network.The labelling of each sample was driver's driving decision data corresponding to the T frame image in the sample.The error between the value of the decision data of the network output and the value of the corresponding skilled driver decision was calculated, and the gradient back propagation was carried out through error, and the network parameters were updated until the network converges.The common error function is the mean square error: In the form, was the decision value of the network output was the decision value used by the corresponding skilled driver.
When making a decision, a buffer queue with a size of T was established.The number of frames used to train the network is the same.The collected images of the vehicle were sent to the queue.After the queue was full, the T frame images stored in the queue were used for decision making.Update the cache queue when a new image was generated.The first image of the team moved out of the queue and the new image was put in the end, as shown in Figure 5.The image in the caching queue used for decision making was pre-processed.The original image and the pre-processed low layer feature image were input into the multi-channel space-time decision network which was loaded with the training parameters, and the decision results can be obtained.
The imaging interval of the micro focus X-ray CT scanner was as follows: (1) an initial state (PV = 0.0) saturated with ion exchange water(IEW); (2) the CT scan imaging and seepage water sampling were conducted for PV = 0.5, 1, 2, 3, 4, and 5.The CT scan image and the sampling were executed for a total of seven times.First, the network structure and weight of the trained model were loaded to obtain the image frames collected by the camera, and then the steering angle was predicted using the loaded model.In order to ensure the safety of the test, the difference between the two predicted values was not greater than 45 degrees.The stability of the output was guaranteed by Kalman filter, and an optimal value was estimated when the output value was obviously unreasonable.Therefore, the Kalman filtering was used to filter the output of the model.The experimental results show that the output results were more stable with the Kalman filtering, the vehicle runs more smoothly, and the comfort of the ride was higher.The convergence of training data is shown in Figure 6.From the Tensor Flow loss diagram, we can see the convergence of loss and the convergence of 18 epoch (1280 samples).

Fig. 1 .Fig. 2 .Fig. 3 .
Fig. 1.Structure of decision network of decision, out W is the weight matrix of the output layer.

Fig. 8 .
Fig. 8.The result map after the training graph intercepts