Modelling and Predicting of Soil Electrical Conductivity and PH from Semi-arid Grassland Using VIS-NIR Spectroscopy Technology

. The electrical conductivity (EC) and pH value are key indicators for soil physical and chemical properties, which can reflect the level of soil acid and alkali, furthermore, influence the vegetation growth. The spectroscopy technique can estimate and evaluate electrical conductivity and pH value rapidly and efficiently, which can provide useful information on the real-time soil management in the semi-arid rangeland or grassland. We picked the semi-arid grassland of northern China covering an area about 200 km2 as the target research area, given that it is highly sensitive to grazing and mining affect. Soil samples were collected from 72 sampling sites in this area, which covered grazing exclusion, over grazing and grassland restoration area. The SVC HR-1024 spectroradiometer was used to acquire soil spectrum. This study aims to indicate the spectral characteristic for soil EC and pH, and propose a predicting modeling method with optimal input spectral region and transformation by comparing the support vector machine (SVM) regression method and partial least squares (PLSR) regression modeling method. Our results showed that: 1) once EC value is larger than 0.10μs/m, the soil spectral reflectance decreases with increasing of EC value. The absorption depth, width and area at 1900nm reduce with increasing of EC value as well; 2) There are positive correlation between EC, pH value and soil spectral reflectance. The highest correlation coefficient value of 0.7 between pH and reflectance is recorded at visible region around 500nm; 3) The SVM modeling method produce the higher prediction accuracy （ RPD=2.18, RMSE =0.035, R2 =0.78 for EC, RPD>3, RMSE =0.349, R2 =0.91 for pH8 for EC, tive correlation betweed PLSR methods in soil EC and pH prediction. This study indicated that it was possible to use the spectroradiometer technology to predict EC and pH value for the soil from semi-arid grassland, which would provide the basis for soil acid and alkali detecting using hyper-spectral remote sensing technology.


Introduction
Soil salinity is world-known as a major threat to agriculture, especially in arid and semi-arid regions [1] .
Soil electrical and pH can serve as one of indicators to detect soil salinization problem in arid and semi-arid soil [2] .It isvital to assess the extent and degree of severity in the early stage identification of soil salnization process in semi-arid areas [3] .Conventional methods has the limitation in requring large amounts of labor and chemicals for aquiring soil properties.Reflectivity can respond to soil changes in real time, quickly and efficiently, and can provide useful information [4] .Based on the hypersepctral techonology, soil suface map was produced for asessing soil salinity [5] .
Soil constituents have unique absorption features over the entire visible (350-700 nm) and near infrared (700-2500 nm) region due to overtones related to stretching and bending vibrations in molecular bonds such as C-C, C-H, N-H and O-H [6] .The NIR and SWIR regions have potential for estimation of soil salinity levels.It is found that the salt terminal molecules have obvious absorption characteristics at 1450nm, 1900nm and 2200nm [7] .Many multivariable techniques have been able to use spectral features to construct predictive inversion models, which is possible to be applicable to non-surveyed areas having similar geographical conditions.The linear models, such as PLSR and principal component analysis, have been widely applied to build models for soil organic carbon prediction.Farifeh et al indicated that the PLSR permit reliable estimations of soil EC at experimental level from the island of Texel in the northwest of the Netherlands [3] .A support vector machine (SVM) is a kernel-based learning method from statistical learning theory, which can be learning in high-dimensional feature space with fewer training data.This method has attracted the attention of researchers and has been widely applied in hyperspectral data analysis [8,9] .
The semi-arid rangeland of north-east China, occupies over 20% of the total grassland in China.The main grassland in this region has been shaped by drought, grazing and fire [10] , and more recently, but more dramatically by mining activities [11] .In the region of the range, the national soil VIS-NIR library better than local scale models [12] .The local calibrations of soil spectroscopic models of field sampling sets may be more accurate than national calibration [13] .Thus, the specific applicable EC and pH prediction model should be developed in local semi-arid rangeland environment.This study aims to indicate the spectral characteristic for soil EC and pH, and propose a predicting modeling method with optimal input spectral region and transformation by comparing the robust regression method, support vector machine (SVM) regression method and partial least squares (PLSR) regression modeling method.

Study area and sample set
The study area is located in the central of Hulunber stepped meadow, land utilization patterns including open-pit mine, mine rehablitated land , lake, farmland, fenced pasture, grazing pasture.The main soil type is chestnut soil.A stratified random sampling method was adopted to set up 72 sample points in the study area.
(Figure 1).The strata were defined based upon land use/cover map derived from the high resolution imagery of 2015, topography map and mine rehabilitation plan.The soil samples were collected at August 2015.Four surface soil samples, 10 cm deep and 5 cm wide, were collected at each sampling point and compressed in a 20 cm box, and then transported and processed in the lab.All datasets were selected to represent the full range in soil organic matter.The dominant pretreatment of soils include drying and sieving for VIS-NIR analysis in the laboratory, which ensure a constant particle size effect on spectra .the process of analysis and testing, the ratio of saturated soil paste and water soil is frequently 1:5 in the water extract .The electrical conductivity of the soil water sample, whose ratio of water and soil is 1:5, is made by taking 30g dried soil sample through 2mm-sieve, adding 150ml carbon dioxide free water, oscillating with stopper for 3 min, air exhausting and filtering, finally water extract is gained, and the electrical conductivity of it is measured with the use of Mettler SevenMulti pH/electrical conductivity comprehensive test instrument .Seven Multi pH electrical conductivity comprehensive test instrument is used to test the pH of mixed supernatant with 10g soil sample and 25ml distilled water.The statistical results of electrical conductivity(EC)and pH data of 72 soil samples is listed in table 1.

Laboratory spectral measurement
In our study, the measurements was taken in the lab under the same environmental condition using the field spectroradiometer, which make sure that all the spectrum of the soil samples are standard.The soil spectrum test with the SVC-HR-1024 spectrometer is located at the vertical 30cm above the surface of the soil sample, using a 4 degree field angle lens.and the area of soil spectrum was a circle with a diameter of 2 cm.The first step of the spectral test is the whiteboard measurement.Then each soil sample is measured four times.The circular aluminum box is rotated by 90° between each measurement.Each angle is repeatedly measured 5 times, the sample was measured 20 times in total, and the average value was taken as the actual reflection spectrum data of the soil sample.

Spectral data preprocessing
Spectral preprocessing with mathematical functions is commonly used to correct for non-linearity, sample variations, and noisy spectra.Additionally, it can highlight spectra characteristic and improve the accuracy of prediction models.In this study, the preprocessing method include logarithm of the reciprocal of reflectance ( log 1/R) [14] and the first derivative of reflectance (DR) at each waveband remove baseline effects [6] .The first derivative of absorbance (R′) is computed as shown in Eq. ( 1).
) ( The spectral data were processed by continuum removal, and the absorption characteristics curve of the spectral curve was obtained.In Figure 2

Modeling and predicting Methods
The spectral data sets are used to establish models for predicting EC and pH.In order to verify the predictive ability of the established model, the data set is divided into calibration / validation.(2/1) with 48 samples for calibration and a test set with 24 samples for verification.The RMSE (root mean squared error) value and adjusted coefficients of determination (R 2 ) value of predictions were calculated to estimate the model.The residual prediction deviation (RPD) (the ratio of standard deviation to RMEcv) was used as rubrics for evaluating the stability and accuracy of multivariable models [15] .The best prediction models are characterized by a RPD of >2.0 with R 2 of 0.80-1.0.The RPD value is most useful when the validation set is independent of the calibration set.
PLSR is a regression technique to find the best function by minimizing the RMSE value.In this study, PLSR is used to predict EC and pH measured in the lab using the spectra matrix.The programs used for the PLSR calculations were those from the PLSR toolbox, with R package version 3.2.1.
SVM has the advantage in solving small sample, nonlinear and high-dimensional pattern recognition, and can be extended to other machine learning model by function fitting.Low dimensional vector sets can be projected to high-dimensional spaces based on the kernel function in SVM.The SVM was applied by using the R package of "e1071 package", an R interface to library for support vector machines (LIBSVM).
The optimal parameter of C, e and kennel-specific in SVM was chosen by using leave one-out cross-validation.

The characteristic soil spectrum of EC and pH
Once EC value greater than 0.1μm/s, the lower reflectance observed for the soil was linked to the higher EC, which absorbs electromagnetic energy(Figure 3).Apart from the highest pH value of 9.5, the lower reflectance observed for the soil was responded to the lower pH value in the visible spectrum region (Figure 4).The absorption feature at 1900nm was selected to analyze the soil spectrum characteristic with different EC and pH level.As shown in the figure 5 and figure 6, the absorption width, depth and area was increasing corresponding to the increasing of EC value.The largest width at 1900nm absorption was recorded for the soil spectrum from pH value of 9.5.For the raw spectrum, the correlation for the EC and reflectance was relatively lower than that for pH value and reflectance (Figure 6).There is a positive correlation from the 400nm-12000nm region between EC and reflectance.The highest correlation coefficient value was recorded around 2200nm.The correlation coefficient value was decreasing between pH and reflectance from 500nm-2500nm.There is the highest correlation coefficient value of 0.70 at the visible band of 500nm.For the first derivative spectrum, the sensitive region with the higher correlation coefficient value was similar to the raw spectrum (Figure 7).Although the correlation coefficient value was fluctuated dramatically from the LOG transformed spectrum, the correlation coefficient value was higher than that from raw and derivative spectrum.The highest correlation coefficient value was 0.50 for EC at 962 nm, and 0.78 for pH at 407nm (Figure 8).

The validation of modeling results
The optimal input spectral region for SVM and PLSR modeling was selected according to relationship between soil EC and reflectance from the Pearson product-moment correlation coefficient.The 714nm-2500nm for raw spectrum with Pearson correlated coefficient larger than 0.3, 480nm-817nm with Pearson correlated coefficient larger than 0.3 for derivative spectrum and 645-2500nm with Pearson correlated coefficient greater than 0.3 was selected as the input spectral region.

SVM
Both preprocessing and input spectral region had a greater impact on modeling methods.As shown in the table 2 and 3, for SVM modeling, the raw spectrum, LOG and first derivative transformation can provide the good model performance in cross validation.For individual input spectral region, the highest correlation is higher than 0.3 between EC(pH)and reflectance.SVM model with EC in the sensitive band by LOG transformation has the highest predictive power with RPD =2.7 for cross validation (R2 = 0.86, RMSE = 0.034) and RPD = 2.18 for external validation (R 2 = 0.78, RMSE =0.035)(Figure 9).SVM model with pH in the sensitive band by raw spectrum has the highest predictive power with RPD >3 for cross validation (R2 = 0.95, RMSE = 0.275) and RPD >3 for external validation (R2 = 0.91, RMSE =0.349) (Figure 10).

PLSR
As shown in the table 4 and 5, for PLSR modeling, the raw spectrum, LOG and first derivative transformation can provide the good model performance in cross validation.For individual input spectral region, the highest correlation is higher than 0.3 between EC (pH) and reflectance.PLS model with EC in the sensitive band by raw spectrum has the highest predictive power with RPD >3 for cross validation (R2 = 0.99, RMSE = 0.001).However, for the external validation, the modeling method should be further improved according to the RPD value less 2 (R2 = 0.69, RMSE = 0.039) (Figure 11).PLS model with pH in the sensitive band by raw spectrum has the highest predictive power with RPD >3 for cross validation (R2 = 0.99, RMSE = 0.119) and RPD = 2.53 for external validation (R2 = 0.89, RMSE =0.497) (Figure 12).

Discussion and Conclusion
In this study, the spectral characteristics of the visible and near-infrared spectra of soil samples from the semi-arid grassland are described.Nonlinear Support Vector Machine (SVM) and Linear Partial Least Square Regression (PLSR) were compared to obtain the best prediction model of EC and pH.
Spectral characteristics know the laws of surface soil changes under different vegetation cover conditions.The soil with higher EC value was recorded with lower reflectance.The absorption feature of depth and width at 1900nm can be used to classify the different level of EC, which increases corresponding to the increasing of EC and pH value.Generally, there is higher correlation for pH with reflectance rather than EC in the 350nm-2500nm.The higher correlation coefficients for EC was detected in the visible spectral coverage.Melendez-Pastor [2] found that the highest correlation coefficient value was 0.70 for mean EC of 1.75 μm/s at the visible band of 500nm.
Both SVM model and PLS model performed well in predicting pH and EC with RPD value larger than 2. The SVM modeling results for EC with RMSE 0.035μm/S were slightly higher than PLS model 0.039μm/S.One possible reason is that PLSR-SVM deals with the non-linear part of the spectral data, whereas PLSR can only deal with the linear part of the correlation between the EC and the spectral data [3] .
There were three key factors in the utilization of field spectroscopy predictions, as proposed by Milton et al [16] .Our study provided the answer to one question, the optimum spectral region for predicting EC from semi-arid grasslands.It was found that preprocessing method and input spectral region had a great impact on the different modeling approach.According to the optimum spectral region (714nm-2500nm) by LOG transformation from Pearson's r analysis as input for the SVM model, the results of cross validation and internal validation of EC prediction have achieved very high accuracy.Generally, the derivative and LOG transformation could improve the modeling accuracy.However, for pH predicting, the raw spectrum was more effective than other transformation both in PLSR modeling and SVM modeling.

Figure 1
Figure 1 The sampling location in the land use map Electrical conductivity and pH are the two indicators of soil samples testing.Currently extraction method is commonly used in testing the electrical conductivity of soil.The absolute content and relative content of all kinds of salinity in water extract are greatly influenced by the ratio of soil and water.During , the horizontal coordinate is the wavelength, and the vertical coordinate is the spectral reflectivity.The absorption characteristics of a spectrum curve is composed of spectral absorption low point m and two shoulders (S1and S2) absorbed by spectrum, the connection of S1 and S2 is called the non-absorption baseline.Different objects have different indicators to extract spectral absorption characteristics, in this paper, absorption depth D(depth), absorption width W(width), total absorption area A(area), symmetry λ(symmetry) are the characteristics indicators of soil spectrum.

Fig. 2
Fig.2 Shape parameters of a spectrum The Pearson product-moment correlation coefficient is selected to indicate the relationship between soil EC, pH value and reflectance band by band.The Pearson correlation coefficient (Pearson's r) is a measure of the linear correlation between two variables, the correlation coefficient is represented by r, and the larger the absolute value of r, the stronger the correlation.It is computed by two variables X and Y, as shown in Eq. (2), where X is reflectance spectra and transformation results of a soil sample in a certain band, Y is PH value or EC value of a soil sample, i is number of soil samples.

Fig. 3
Fig.3 Hyperspectral reflectance curves of soil with different conductivities Fig.4 Hyperspectral reflectance curves of soil with different pH

Fig. 6
Fig.6 Pearson's Correlation coefficient between measured EC、pH and reflectance at different bands

Fig. 9
Fig.9 The fitting results of prediction values and the actual values of EC with SVM

Fig. 11
Fig.11 The fitting results of prediction values and the actual values of EC with PLSR Table2 Prediction of EC in different spectral transformations of SVM modeling

Table 3
Prediction of PH in different spectral transformations of SVM modeling

Table 4
Prediction of EC in different spectral transformations of PLSR modeling

Table 5
Prediction of pH in different spectral transformations of PLSR modeling