Driveable Area Detection Using Semantic Segmentation Deep Neural Network

. Autonomous vehicles use road images to detect roads, identify lanes, objects around the vehicle and other important pieces of information. This information retrieved from the road data helps in making appropriate driving decisions for autonomous vehicles. Road segmentation is such a technique that segments the road from the image. Many deep learning networks developed for semantic segmentation can be fine-tuned for road segmentation. The paper presents details of the segmentation of the driveable area from the road image using a semantic segmentation network. The semantic segmentation network used segments road into the driveable and alternate area separately. Driveable area and alternately driveable area on a road are semantically different, but it is a difficult computer vision task to differentiate between them since they are similar in texture, color, and other important features. However, due to the development of advanced Deep Convolutional Neural Networks and road datasets, the differentiation was possible. A result achieved in detecting the driveable area using a semantic segmentation network, DeepLab, on the Berkley Deep Drive dataset is reported.


Introduction
Advanced Driver Assist Systems (ADAS) and autonomous driving technology have greatly contributed to the explosive growth of the field of deep learning which was fueled by the massive collection and availability of road datasets for public usage.Among the many computer vision tasks involved in ADAS and Autonomous driving, road detection has been considered one of the most important topics of research especially in the scope of level 4 and level 5 of driving automation [1].The road perception algorithms are the first step for any subsequent path planner which aid in autonomous navigation.Remarkable progress has been achieved in road segmentation in the last decade both in feature-based and learning-based approaches.Automatic driving decisions collision-free on the segmentation roads from vehicles.Often highlevel information about roads such as ego-lane detection [2], object-lane relationships [3], etc is required for successful cognitive actions ensuring collision-free navigation.
With the advent of deep neural networks and publicly available datasets that aid in explicit labelling of road relates features such as drivable area, lane markings, etc there is a tremendous interest in extending the road detection problem to drivable area detection.In fact, much recent research had focused on the detection of flat areas which are considered to be drivable.Such high-level information allow self-driving cars to act very similarly to human drivers.The paper presents the details of the implementation of driveable area detection using a deep neural network developed for semantic segmentation.Driveable area detection involves precise segmentation of ego-lane i.e. the lane of travel of the vehicle on which the camera is mounted (egovehicle).The driveable area is segmented based on the availability of the lane markings.The dataset used for training in the research work presented is Berkeley Deep Drive (BDD) Dataset [4].

Related Work
Many researches and works have been going on to solve road segmentation problem since the development of deep neural networks.Even before the recent evolution of deep neural networks, road segmentation has been carried out using other computer vision algorithms, but the features learned by DCNNs were incomparably good.In a work, Road images were segmented into parking lanes, sidewalk and into a road using ariel and ground views of the image, with inputs from Camera, GPS, and IMU systems [5].Caltagirone L et al, in their work, detected road using DCNN with data from camera and LIDAR [6].They have applied FCN on a subset of the KITTI dataset for the experiment.In another work, a different approach was carried out to detect roads with minimal amounts of labeled data.The work used semi-supervised learning applied to inputs from camera and LIDAR data combined together.Yang X et.al extracted lane marking and detected road using a combination of Recurrent Neural Network and U-net to reduce the propagation error and improve the accuracy of detection [7].A model based on the conditional random field, which takes in inputs from both camera and LIDAR as random variables, was experimented by Xiao L [8].Various research has been carried out for road detection and lane marking.The introduction of the Berkley DeepDrive dataset gave rise to the experimentations on driveable area related research Semantic segmentation has evolved since the evolution of CNNs had taken heights.First of such state-of-the-art architectures, was developed by Long J et.al [9].Using the classification network's learned features, the model adapted the classification network into a fully convolutional network and fine-tuned it for semantic segmentation.Unlike [9], which has done end-to-end learning, the model proposed by Noh H et.al, which was built on the classification network VGG-16, integrated deep deconvolution network and proposals prediction [10].R-CNN model developed by Girshick R et.al applied convolutions on each of the region proposals extracted from the image and classified the regions into labels [11].This work gave rise to a new branch of research on combining region proposals and convolutions.Convolution and deconvolution on the image during segmentation, due to downsampling and pooling, misses the high-resolution information required for segmentation.In the model developed by Lin G et.al, to retain the high-level spatial information, residual connections between the convolution and deconvolution layers were given like identity mapping [12].An interesting work by Luc P combined the sematic segmentation network with an adversarial network.Using adversarial training, the inconsistencies between ground truth and the predicted map were detected and corrected [13].Then came the set of models that concentrated on dilated convolutions, that helped in getting the local information along with the global information, by increasing the receptive field of the network [14,15,16,17].One such model is DeepLab which combines deep convolutions, dilated convolutions at different range, and Conditional random fields.This model gained 79.7 percent mIOU on PASCAL VOC-2012 dataset.

Details of Dataset
To find a suitable dataset for the intended purpose of driveable area detection, the datasets available for road data were referred.PASCAL-VOC 2012 dataset [18], Microsoft COCO dataset [19], ADE20K dataset are some of the general semantic segmentation datasets.These datasets contain different objects of vast categories, thus helpful in training the network to identify any object present in the image.For some specific objects of interest to be identified by the network (e.g.road, car, signal, etc. for the autonomous driving scenario), the weights of the network can be fine-tuned by training it with objects specific dataset (e.g.Datasets with road objects).Some roadspecific datasets available are Cityscapes dataset [20], KITTI road dataset [21], Apolloscape dataset [22], Oxford RobotCar Dataset [23], Berkley Deep Drive dataset [4].Among these, the BDD dataset was chosen based on the merit of large scale annotations for driveable area segmentation.This information will be helpful in determining vehicle localization on the road with image data and in turn be helpful in decision making.Though few other datasets have annotations done for road segmentation, these were non-standard and very small in size.BDD dataset is promising in the scope of its attributes such as diverse nature, a sheer large number of annotations of images specifically made for driveable area detection.The dataset has images taken from a variety of geographical locations, environmental and weather conditions.The dataset has labels , "Directly Drivable Area", Alternatively Drivable Area"."Directly Drivable Area" is the area that the driver is currently driving on.The driver has priority of this area over the other cars."Alternately Drivable Area" is the area that the driver is not currently driving on , but can drive on it by changing lane.The dataset contains 91626 instances of Directly drivable area and 88392 instances of Alternatively drivable area.The annotation is given as a mask image which contains pixel level labels for drivable area, alternative area and the background.The dataset is also annotated for lane markings, object detection and instance segmentation.For all the work reported in this paper, the BDD dataset images with annotations for driveable area detection alone are utilized.The driveable area annotations consist of a pair of images for each example.One image is the raw RGB image of resolution 1280 × 720 pixels and the other image is the annotation image of same size which is also an RGB image.The annotation image has the ego-lane pixels in red, other lanes marked in blue and all other regions of the image in black.A sample image pair is shown in Fig. 1.

Deep Learning Network
The work presented in the paper is about the implementation details of applying a popular sematic segmentation network for the purpose for driveable area detection.Though conceptually driveable map inferencing is similar to the general semantic segmentation problem, the challenges involved in the former is very high.This is mainly because the classification is between two regions which are exactly identical in terms of all visual properties.The only differentiating element is the lane marking which is assumed to be of contrasting the color of the driveable area (roads).Deep convolutional neural networks have been the choice for various classification tasks in the computer vision system mainly attributed to the built-in invariance to local image transformations.But this property doesn't meet the requirements of dense inferences such as semantic segmentation where abstraction of information in the image is not a requirement.Among the Deep CNN based architectures available for semantic segmentation in the current work reported in the paper, DeepLab [24] is identified to be suitable for the intended purpose of driveable area detection.This is mainly because of its key property such as the good resolution of the features, good ability to tackle the problem of objects' existence at multiple scales and high localization accuracy which, in a deep CNN framework is difficult to obtain due to the invariance property.DeepLab enjoys these merits by strategically incorporating significant changes in the architecture.Some of the important changes introduced in the DeepLab which not only has benefited the semantic segmentation problem but also to a higher level of complexities such as the driveable area detection problem as follows: • Atrous convolution instead of convolution with downsampled filters: Generally CNNs would have a series of convolutional layers interleaved with pooling layer resulting in downsampling of filters.Atrous convolution does convolution after the last few pooling layers with upsampled filters instead of downsampled filters as in the case of regular convolution.This strategy basically improves the feature resolution which is very valuable in pixellevel inferences as in the case of driveable area detection.It may be noticed that the atrous convolution serves as a valuable replacement for the deconvolutional layers which is commonly found in most semantic segmentation networks [25].A high-level block diagram of the network architecture is shown in Fig. 2.

Fig. 2. Abstract Block Diagram of DeepLab Network
Beyond the specified advantages that DeepLab specifically addresses some important advantages that are relevant to the driveable detection problem from a practical stand-point are speed, accuracy, and simplicity of the network.Since the intended application is for a driving scenario real-time nature, accuracy and speed are primary concerns for any segment of algorithms.The detailed block diagram of the DeepLab network is presented in Fig. 3.

Results and Conclusion
In the lines of most semantic segmentation benchmarking, mIoU (mean intersection of union) is used as the evaluation metric to understand the performance of the network.Other evaluation metrics are not presented within the scope of this paper.Sample images from the validation set and the corresponding inference images, containing three classes viz., driveable area, alternately drivable area, background, are presented in Fig. 5.The intersection of union is measured between the inference images and the ground truth in the dataset [19].IoU for the first 48 images are presented in Fig.The IoU for driveable area detected using DeepLab semantic segmentation network is given in the paper.As part of the future work, driveable area detection would be carried out on stateof-the-art semantic segmentation networks and compared.And extraction of high level semantics useful for autonomous driving would be carried out using the detected driveable area.
Atrous spatial pyramid pooling: To address the problem of scale invariance, generally images of objects acquired at many scales are used.DeepLab utilizes a pyramid pooling strategy which is a computationally efficient way of ensuring scale invariance.The strategy involves the usage of multiple filters that have complementary fields of view.This ensures capturing of objects as well the context in multiple scales resulting in the scale invariance property.•Fully-connected CRFs: The property of accurate localization is incorporated through the use of fully connected pairwise conditional random fields.

Fig. 3 .
Fig. 3. Detailed Block Diagram of DeepLab Network The implementation of DeepLab network for the driveable area detection is carried out with Tensorflow API.Tensorflow provides pre-trained models of DeepLab which

Table 1 .
Key Training Specifications