Hyperspectral Estimation Methods for Chlorophyll Content of Apple Based on Random Forest

. Chlorophyll content is a good indicator of fruit tree nutrition stress, photosynthesis, and another physiological state. 10 vegetation indices were selected and used as input variables of RF model, the number of input variables was gradually increased from 1 to 10. The modeling accuracy of 10 RF models with vegetation indices was compared. Finally, the accuracy of 2 estimation models, the RF model with the original spectrum, and the RF optimal model with vegetation indices were established and compared. The result, For modeling accuracy of 2 models, the R 2 of four models are 0.527 and 0.609, and the RMSE of 2 models are 8.728 and 7.930  g/cm 2 , respectively. For validation accuracy of 2 models, R 2 of 2models is 0.411 and 0.843, RMSE is 14.455 and 11.034  g/cm 2 , respectively. The result showed, (1) the accuracy of RF model with vegetation indices is higher than the other model. (2) The RF model with vegetation indices can estimate the chlorophyll content of apple leaves more accurately and it had the potential for estimating chlorophyll content of apple leaf. And it provides a new method for the accurate estimation of chlorophyll of apple leaves.


Introduction
Chlorophyll content is an important biochemical parameter in the growth process of fruit trees.It is instructive for the photosynthetic capacity, developmental stage and nutritional status of fruit trees.And it is the indicator that the fruit trees are affected by environmental stress and disease indicator [1][2].It is very important to use the hyperspectral extraction of leaf chlorophyll content information to monitor the growth status and nutritional diagnosis of fruit trees.In recent years, domestic and foreign fruit trees remote sensing monitoring has made some progress.The model of red edge spectrum and grape chlorophyll content was established, the RMSE(Root mean square error) < 30mg• m -2 .The original spectrum of apple leaf and the leaf spectrum of wavelet filter, the support vector machine , and partial least squares were used to establish the chlorophyll estimation model [4].The correlation between original leaf spectrum and first-order differential spectrum of apple leaves and chlorophyll content of apple leaves was analyzed.The leaf chlorophyll regression model was established with the spectral position, vegetation index and spectral area as the variables, respectively.The results showed that the exponential model constructed by blue edge position had higher estimation accuracy [5].The total nitrogen content of the leaves of pear was combined with the original spectral sensitive band and the first-order differential sensitive band, and the total nitrogen content of the leaves was established by stepwise regression analysis.Finally, the first order differential of the spectrum was determined to participate in the constructed model as the leaf total nitrogen content estimation model [6].The original spectrum of apple leaf, the first order differential spectroscopy, the principal component analysis method and the stepwise regression analysis method were used to establish the chlorophyll content estimation model [7].Researchers have made a number of attempts to estimate the chlorophyll content of leaves of fruit trees.In the case of remote sensing monitoring of fruit trees, the use of random forest (RF) methods has been rare, and RF has been used as a machine learning method in Agricultural remote sensing field.Construction of Vegetation Index, RF and Artificial Neural Network were used for inversion of winter wheat leaf area index with Environmental Satellite Dat [8].Based on the RF construction model, the vegetation index was used to estimate the soil and plant analyzer development (SAPD) of the winter wheat using the high score number one satellite data [9].Establishment of SPAD Inversion Algorithm for Wheat Jointing Stage, Booting Stage and Flowering Stage by RF Regression Algorithm [10].A hyperspectral model for the determination of leaf area index of apple trees using support vector machine and RF [11].The potential of RF to estimate the biomass of winter wheat was confirmed by constructing the model by combining the correlation coefficient, gray correlation and bag data importance respectively [12].
Analysis of domestic and foreign research results found that most of the research focused on a single growth period of fruit trees for nutritional diagnosis, resulting in the results of the whole tree growth cycle for the nutritional status of the lack of evaluation criteria and guidance [4], this article is for 2 consecutive years estimation of chlorophyll content in leaves of apple during whole growth period.Most studies use only the chlorophyll-related sensitive bands or only use the vegetation index to model, and there is no comparison between the sensitive bands and the vegetation index.
In summary, based on the original spectral and vegetation indices, the 2 models were compared with the original spectral sensitivity and the vegetation indices respectively.The aim of this study was to apply the RF model to the estimation of chlorophyll content and to select the optimal model for estimating the chlorophyll content of apple leaves in order to provide a guide for the rapid estimation of chlorophyll content.

Overview of the study area
The experiment was carried out from 2012 to 2013 in the apple orchard of Xiazhai Village, Chaoquan Town, Feicheng City, Shandong Province.The line spacing was 5 m, the plant spacing was 3 m, the tree height was about 3 m, the tree trunk was about 0.5 m, and the tree is spindle-shaped.The vein of the leaf should be avoided when the leaves spectrum were measured by ASD hyperspectral spectrometer.The spectral parameters are as follows: the spectral range of the spectrometer is 350 ~ 2 500 nm , and the interval is 1 nm.Each leaf was measured in four different positions (twice each side of the veins, and the veins were observed to cover the entire blade during the test), and the average of the four reflectances was taken as the reflectance of the leaf.Before measurement, the standard whiteboard with the blade holder was used to calibrate.

Determination of chlorophyll content in apple leaves
Leaf chlorophyll was sampled at the corresponding position of leaf spectral measurement.And leaf chlorophyll content was measured by chemical method.The four leaves of each tree were punched, and the veins of each leaf were avoided and play 8 holes, covering the entire blade, corresponding to the spectral measurement position.The sample mass is about 0.2 g.The sample was then put in 95% absolute ethanol solution and allowed to stand in a dark environment for 24 to 48 hours until leaves become white.The chlorophyll content of the apple leaves was determined by ultraviolet spectrophotometer (g/cm 2 ) [13].The data collected in this study are shown in Table 1, and the statistical characteristics of the chlorophyll data obtained are shown in Table 2

Vegetation index selection
According to previous studies, 25 vegetation indices with good chlorophyll correlation were selected as variables for estimating chlorophyll content, as table 3 shows.

Random forest
RF is a machine learning algorithm published by American scientist Leo Breiman [35] in 2001.RF based on bootstrap sampling method, extract multiple samples from original samples, use every decision tree to model each bootstrap sample, then combine multiple decision trees to predict, and finally decide the final prediction result by voting.
Bagging [36] is part of RF theory.Assuming that the sample size of the sample set is N, the number of Bootstrap samples taken per time is n, and this part of the sample that is not drawn is called the data outside the bag.These unselected out-of-pocket data can be used to estimate the classification strength of RF single tree, the greater the classification intensity, the smaller the generalization error of RF, the higher the accuracy of classification, and the more accurate prediction [37].In this study, the importance of the existing vegetation index and the chlorophyll content was analyzed and sequenced using the out-of-pocket data estimation method in RF.The former vegetation index is modeled and estimated as a decision tree.Try to set the number of decision trees to 1000 for the best.

Statistical analysis
In this paper, we estimate and verify the accuracy of the model selection coefficient (R 2 ), RMSE as the evaluation criteria.In general, the smaller the RMSE, the greater the coefficient R 2 , indicating the higher the accuracy of the model.

Correlative analysis of chlorophyll content and spectrum in leaves
The correlation between chlorophyll content and original spectrum of apple leaves in 2012 is shown in Fig2.There

RF model based on original spectrum
The RF spectra were constructed using the original spectral sensitivity bands 554, 708 and 995 nm reflectance R554 ,R708 和 R995, and the corresponding spectra and chlorophyll content in 2013 were verified.Modeling R 2 is 0.527, RMSE is 8.728 g/cm 2 , in table 4.

Screening of vegetation index
The correlation between chlorophyll content and vegetation index was calculated by using OOB importance estimation method.This paper only considers the top 10 vegetation indices after sorting, because of the operability and simplicity of the model, as shown in Table 5.The vegetation index is sorted by the OOB method.The sensitive single band 554 nm, 708 nm, 995 nm, were selected with good correlation between original spectrum and leaf chlorophyll content of the apple.The three band were chosen as variables of RF model.
For modeling accuracy, the R 2 of RF model with sensitive bands and vegetation indices were 0.527 and 0.609, the RMSE of that were 8.728 and 7.930g/cm 2 , respectively.For validation accuracy, the R 2 of that were 0.411 and 0.843, and the RMSE of that were 14.455 and 11.034 g/cm 2 .
The results showed that the estimation accuracy of chlorophyll content in RF model based on vegetation index is higher than that of RF based on the original spectrum, and the leaves chlorophyll content can be estimated more accurately.The RF model can be applied to the estimation of chlorophyll in apple leaves.

Fig 1 Study area map 2 . 2
Fig 1 Study area map .The model was established using the data of chlorophyll content and leaf spectral reflectance (n = 299) of apple leaf in 2012, and the accuracy of RF model with sensitive bands and vegetation indices was verified by the data collected in 2013 (n = 180).

Fig 2
Fig 2 Correlation between spectrum and chlorophyll content Fig 3a Validation of RF model with original spectrum Fig 3bValidation of RF model with vegetation indices

Table 1
List of data acquisition at each measured time

Table 3
Summary of spectral indices related to chlorophyll content

Table 4
The accuracy of RF model with original spectrum

Table 5
Sequence of relation between vegetation index and chlorophyll and OOBAccording to the importance of OOB, the first 10 vegetation indices were selected, and the input number of vegetation index was increased to establish the chlorophyll content estimation model.The modeling results were shown in Table6.As shown in table 6, when the number of input vegetation indices increases from 1 to 5, the overall trend of R 2 increases and the RMSE decreases.When the number of vegetation indices is 5, R 2 reaches the maximum, 0.609, RMSE is the smallest, 7.930 g/cm 2 .When the number of vegetation indices is 6 ~ 10, R 2 rises from 0.597 to 0.606, RMSE decreases from 8.067 g/cm 2 to 7.966 g/cm 2 , but R 2 is less than 5 vegetation indices R 2 and RMSE are more than 5 vegetation Index of RMSE.Therefore, in the case of selecting the top 10 vegetation indices, the RF model constructed by the first five vegetation indices is the optimal model.

Estimation and verification of chlorophyll content RF
model verification accuracy was as shown in Figure 3.For RF model based on the original spectrum, R 2 and RMSE of validation were 0.411 and 14.455 g/cm 2 , respectively.The validation accuracy of RF model with vegetation indices was R 2 0.843 and RMSE 11.034 g/cm 2 .The results indicate that the RF model based on vegetation index has