Farmland Weed Species Identification Based on Computer Vision

. In order to alleviate the difficulties in collecting indexes for the analysis of farmland weed communities, we implemented a computer vision technology-based method for the identification of farmland weeds at the species level. By using the super-green and maximum interclass difference methods to obtain a green vegetation binary image, we were able to separate weeds from cultivated crops through multiple etching and the removal of small areas. A BP (back propagation) neural network was used for weed recognition, and the morphological characteristics of the weeds and each region were selected following etching to construct the input matrix of the recognition model for training and testing the BP network. After experimenting with the computational vision method for the identification of five weed species, we discovered that the recognition accuracy rate reached 96%. The results showed that the computer vision method could quickly and accurately extract a weed community analysis index, thereby providing a reference for the intelligent analysis of weed communities.


Introduction
Rice paddy weeds compete with rice for sunlight, nutrients, and water, which can severely impact the yield and quality of rice [1,2] .Conversely, weeds are an important component of a balanced and stable farmland ecosystem and play an important role in holding soil moisture and fertilizers, accelerating nutrient cycling, and providing a living environment for soil organisms [3,4] .Studying farmland weed communities is therefore of considerable significance for ensuring food security and maintaining ecosystem stability.Weed species constitute the primary quantitative index in studying farmland weed communities.Three primary methods for identifying weeds in the field exist: the manual identification method, the remote sensing method, and the computer visionbased recognition method.Currently, data in a fixed area is mainly obtained manually [5][6][7][8] .The investigation of weed species usually requires the professional evaluation or the assistance of the weed primary color spectrum [3] .
The development of computer vision and image processing technologies has played an important role in weed species identification [12][13][14][15][16][17] .The identification of weed species has mainly been performed by extracting morphological, color, and texture parameters, and using these characteristic parameters to establish a weed identification model.The identification models commonly used include the camera supported model and the artificial neural network model [16][17][18][19] .Some scholars have used leaf color to identify weeds at the species level, but the recognition speed and accuracy is low and requires further improvement.Additionally, some scholars use morphological characteristics in identification, however, extracting the morphological characteristics is difficult as the leaves tend to intertwine with each other in the natural environment, and thus this method is not conductive for outdoor weed identification.Therefore, accurate weed identification cannot be achieved using only one factor, and multiple features are required for a comprehensive analysis.In this study, by extracting characteristic morphological, color, and texture parameters, an identification model for weed species was established using multi-characteristic parameters.The current research mainly uses a computer vision system to identify weeds from crop plants and measure the soil background to calculate their distribution [16,19] .Few studies have focused on the identification of farmland weed communities.
In this study, we developed a computer vision extraction method for weed species, thereby providing a reference for the intelligent analysis of farmland weed communities.

Materials and Methods
The images used in this study were obtained from rice-wheat rotation fields from 2014 to 2016.Five common species of field weeds were selected for the species recognition experiments.There were 40 images in total (750 × 750 pixels) for each weed species.The identification process is shown in Fig. 1.
Fig. 1 The processes of identify weed types

Extraction
As shown in Fig. 2a, the original image includes rice plants, Monochoria seedlings, duckweed, and Spirogyra.The green vegetation of the samples was extracted using  1) and the OTSU method [17] , and the extraction results are shown in Fig. 2b (including crops, weeds, and Spirogyra).In the RGB images (Fig. 3), the green band of Spirogyra was relatively smaller than the weeds and crops.We first removed the areas with fewer than 1,000 pixels in Fig. 4a, and then used a 3 × 3 template to etch the binary images, and then removed the areas with fewer than 800 pixels in the binary images.Following this, we used the same operation to etch the binary images twice and removed the areas with fewer than 600 pixels and 400 pixels; performed 3 expansion operations on the obtained binary images by using the 3 × 3 templates; filled the smaller holes in the region by using the hole-filling algorithm [21] ; and obtained the binary images of the weeds.

Weed Species Identification
Weeds are difficult to identify at the seedling stage.We therefore selected Galium aparine, Veronica didyma, Chenopodium serotinum, Monochoria vaginalis, and Vicia sativa as study objects.The binary images of the weeds (Fig. 4) were obtained by using the green vegetation extraction method detailed in Section 2.1.

Feature Parameter Selection
Due to the influence of light, image acquisition equipment, and other factors, the color and texture characteristics of the same weed species obtained in the wild often differ.However, the morphological characteristics are not affected by these environmental factors.Therefore, we selected the following morphological parameters to construct the feature vector for identifying weed species: compactness: quasi circularity: Where A is the area of the binary region, P is the circumference of the binary region, M is the major axis length of the smallest circumscribed rectangle, R is the shortest distance from the regional centroid to the boundary, S is the minor axis length of the region, and L is the major axis length of the region.To better reflect the blade shape and blade configuration, we used a template of a certain size (the size of the template approximately equal to 1/60 of the binary region) to etch the region.The number N of the binary region in the image following etching is the characteristic value, and the mean and variance of the quasi circularity, foliaceous, and centrifugal rate of each region also constitute characteristic values used to identify the weed species.

Selection of the Recognition Model
In this study, we used the BP (back propagation) neural network to identify the five weed species.The back propagation (BP) neural network is a multi-layer feedforward network trained according to error back-propagation algorithm and is one of the most widely used neural network models.BP network can be used to learn and store a large number of mapping relations of input-output model without prior description of the mapping relationship [22] .In the study, 20 images were selected for each species for the construction and training of the network.The network input data constitute the 11 characteristic values described in Section 1.3 and were extracted from each image.The input data were normalized to increase the network convergence speed and diagnostic accuracy [23] .A three-layer BP neural network was constructed, and the network input layer consisted of 11 neurons corresponding to 11 eigenvalues, respectively.There were 20 neurons in the hidden layer, and one neuron in the output layer.The learning rate was set to 0.5, the inertial coefficient was 0.8, the number of iterations was set to 1,000, and the target error was set to 0.001.The hidden layer transfer function was selected as the Tansig type, namely:

Results
The 100 samples for the verification of the recognition model were numbered in sequence: 1-20 for G. aparine, 21-40 for V. didyma, 41-60 for C. serotinum, 61-80 for M. vaginalis, and 81-100 for V. sativa.We extracted the 11 characteristic values of each sample to construct the 11 × 100 input matrix and normalized the data.The normalized test samples were input into the network to perform the simulation.The output result of the simulation is shown in Fig. 7, and all test samples were identified.
The recognition accuracy was only 87% when only the four feature values for the weeds were used in the training and development of the method, without using the morphological characteristics of each region following etching.The recognition was even lower for G. aparine and V. didyma at only 80% and 85%, respectively.The simulation output result is shown in Fig. 8.

Discussion
Common methods of farmland weed recognition include these features: threshold segmentation method based on machine vision, weed identification method based on genetic algorithm, weed identification based on polarization spectrum and weed identification method based on BP network.According to the external characteristics of weeds, threshold segmentation method based on machine vision realizes image segmentation and weed recognition through using the threshold segmentation model.The recognition rate was about 85% [24] .The weed recognition method based on genetic algorithm can identify weeds quickly and accurately, and reduce the occurrence of local optimal value.The recognition rate was about 93% [25] .The method of recognizing weeds based on polarization spectrum can identify the weed model well by analyzing and comparing the spectral response law, spectrum characteristics and decision precision of weeds, and the comprehensive recognition rate of different polarization states reaches more than 90% [26] .After the training of samples, weed recognition method based on BP neural network can recognize 96% of weeds in seedling stage.According to different site requirements, the most suitable identification algorithm can be selected for application in practical application.

Conclusion
The correct identification of weed species in farmlands is a prerequisite for the analysis of farmland weed communities.In this study, we proposed a weed extraction method for the field environment and successfully identified weed species in this environment using a combination of 11 morphological features and a BP neural network.We reached the following conclusions: 1) We proposed a computer vision-based weed extraction method for the field environment.The green vegetation in the field could be successfully extracted by using the super-green method and OTSU algorithm.The effect of Spirogyra on the paddy field images was eliminated by using the green component and the OTSU algorithm in the RGB mode.
2) We proposed a weed identification method based on the BP neural network.
Including the morphological characteristics of the weeds and of each area following etching could better reflect the weed species.When only weed characteristic values were used, the recognition accuracy rate was 87%, and the accuracy rate did not exceed 85% for the identification of G. aparine and V. didyma, which share similar morphological characteristics.The network trained by the two types of characteristic values at the same time was thus able to identify the samples.This indicated that the method proposed in this paper could effectively identify and classify field weeds with a recognition accuracy rate above 96%.
3) In this paper, we segmented and identified five common weed species occurring in the field environment.The process was neither affected by the external lighting conditions nor the image acquisition equipment, has wide adaptability, and can provide a reference for the recognition of other weeds under natural conditions.However, weeds at the seedling stage were selected for the weed identification experiment, and thus the identification of weeds at maturity still requires further study.

( 6 )
Where a is the slope function of the Tansig function, and a is 2 in this paper.The training process of the weed recognition model is shown in Fig. 5.The 100 network training samples were numbered in sequence: 1-20 for G. aparine, 21-40 for V. didyma, 41-60 for C. serotinum, 61-80 for M. vaginalis, and 81-100 for V. sativa.The input sample matrix of the network training was 11 × 100 and the output matrix of the network training was 1 × 100, with each element corresponding to the classification of the sample and a value of 1, 2, 3, 4, and 5, respectively.After 1,000 iterations, the target error of 0.001 was achieved.The training results are shown in Fig. 6.It was evident that all the samples had been correctly identified.