Research of Large-Scale and Complex Agricultural Data Classification Algorithms Based on the Spatial Variability

. In the actual classification problems, As a result of lack of clear boundary information between classification objects, that could lead to loss of classification accuracy easily. Therefore, this article from the spatial patterns of the sample properties to proceed, fuzzy clustering algorithm is proposed based on the sensitivity of attribute weights, through using the attribute weights to improve the classification capability between confusing samples, That is for researching and analysising soil nutrient spatial data with consecutive years to collect in Nongan town. Then through the analysis of the visualization technology to realize the visualization of the algorithm. Experimental results show that introducing weights portray attribute information could reduce the objective function value, and effectively alleviate the the phenomenon of boundary data that cannot distinguish. Ultimately to improve the classification accuracy. Meanwhile, use of MATLAB to form visualization of three-dimensional image. The results provide a basis for to improve the accuracy of data classification and clustering analysis of large and complex agricultural data.


Introduction
The arrival of the era of precision agriculture [1,2] , makes a variety of complex link relationship between agricultural data features with apparent spatial variability [3] and the correlation.The consequent massive, diverse and dynamic changes, incomplete, uncertain and a series of features, so that each attribute internal link close, but contact between attributes relatively sparse [4] .However, Data mining can effectively for data analysis, Wherein the cluster analysis can be used as an independent tool to obtain data distribution situation, so that can observe characteristics of each class, analysis some specific class to move forward a single step, Final extract useful information.But with the rising importance of data structure information and the data on the exponential growth.This shows traditional data mining algorithms have been unable to meet these needs.How to introduce spatial patterns of in large-scale agriculture data [5] .And to strengthen the links between attributes for regional management in order to improve the parallel and distributed implementation strategy of clustering algorithm [6,7] .All of these are gradually attracted researchers' attention [8] .So, On the basis of K-means algorithm, according to the interdependence of spatial unit location.Li Xiang [9] , who put forward a new Spatial Contiguous K-Means Cluster algorithm, who removed a lot of debris and isolated cell and taken into account the continuity of the management partition.The actual show that the method is suitable for variable precision agriculture field management operations.Fleming etc. noted that define management zones based on oil properties, terrain and farmers' production experience.Then Appeared feature selection methods which is proposed for large-scale data sets.The purpose is to improve the data processing efficiency and rationality of the decision-making program [10] .While Cui proposed Quick association rules mining algorithm based on a large dense database of vertical data step [11] .
Studied the basis of the existing methods, consider the algorithm when dealing with large data sets required scalability and efficiency.Analysis the influence of spatial variability and structural information on the temporal and spatial data, the paper proposed Fuzzy C-Means algorithm that based on attribute weights.In the case of verify its reliability, analysis the algorithm through MATLAB toolbox graphical to further improve the quality of clustering results.The results showed that the introduction of spatial variability and structure information can effectively reduce losses, due to the imbalance caused by the boundary.So, in visual processing, MATLAB played an immediate role.

Sensitive Attribute Weights Fuzzy C-means Analysis
The traditional clustering algorithms in the classification process vulnerable to the sample spatial variability and structure information effect, the existence of boundary data processing hard to demarcation issues, which lead to low accuracy [12] .So, fuzzy c-means cluster algorithm was introduced to analysis of data space structure information [13] .Then constructed fuzzy similar matrix directly after standardization of data.Therefore, this study and master the premise of in soil temporal and spatial variation characteristics and laws, The combination of the spatial variability of attribute weights applied to FCM algorithm.Ultimately improving algorithm's classification capability, reducing the loss of classification precision caused by boundary spatio-temporal data.

Construction of Sensitive Attribute Weights
Firstly, We analyzed spatial patterns of soil nutrient which according to experience of experts in the field and soil fertility characteristics of test area [14].The results showed that available p variation coefficient was 31.12%, and the number of rapidly available potassium was 21.51% ,and available nitrogen was 11.69% [15] .Secondly, the space coefficient of variation was introduced to the algorithm and AHP could solve the weight coefficients.Now, Specific steps are as follows: (1) Construct pair wise comparative matrix; (2) Selected any n-dimension normalization original vector w (0) ;

Construction of Attribute Weights Fuzzy C-means Algorithm
When deal with clustering problems with fuzzy concepts [16,17] , each sample is not divided into one class strictly, but belongs to a category at a certain membership.Define: ), which based on the membership degree matrix [18] .Objective function refers to the sum of weighted square distance between the samples and the center of the cluster.The optimization class refers to make the objective function to take the minimum class.If all points of a class are closed to the center of the class, the value of the goal function is very small.

MATLAB Modeling Tool
MATLAB is an interactive programming language based on matrix manipulation, The main functions of MATLAB including data analysis, numerical calculation and engineering drawing and so on [19] , It can also marked and print the graphics.This paper adopts MATLAB to process and analyze the data, compares this kind of algorithm with the traditional fuzzy c-means, compares the accuracy of the algorithm through the objective function value, and realize the 3d visualization process of the data.

Data Sources
The use of 3S(these are GPS, GIS, RS) and sensor technology acquisition Nongan town soil nutrient information, then based on the geolocation of arable land, then positioning collecting position of the point.Select the main factor affecting soil fertility [20] (nitrogen, phosphorus, potassium) as the sample data for research, the sampling distribution is as follows:

Fig.1 Soil Sampling Point in Nongan town
The plum blossom in figure 1 sampling methods is the five sampling method, that is the grid on the four horns and on the center of the grid as the soil samples mixed grid soil sample.Collecting the data content to farmers in 2010 in the town of partial data, for example, as be showed in table 1.

Data Processing
Processing soil nutrient data in Nongan town from 2008 to 2012, Take on [0,1] evenly distributed random number to determine the initial membership degree matrix.Which determined cluster center by iteratively.Among them, step l iteration of the cluster centers is: where c is the number of classes, m > 1.

Application and Analysis of Algorithms
Preprocessing the data after combined with the analysis of the soil nutrient spatial variation.Algorithm through continuous iterative to adjust the size of the objective function value in order to achieve the classification of soil fertility.To objects in the cluster based on the continuity of time and space, after processing the sample data, We use sensitive attribute weights fuzzy C-means algorithm to analysis the data from 2008 to 2012.Experiments show that when taking membership degree exponent 8, clustering result is obvious.In the case of the same power exponent value, compared with the traditional fuzzy C-means clustering algorithm, after repeated experiments and found that the accuracy and operational efficiency of the improved algorithm are both higher traditional algorithm.Wherein the results of 2011as shown in Table 2. From Table 2 we can see that under the same conditions, the objective function value is smaller, and the accuracy of the relative increase 21.7%, also it has a higher operating efficiency.That because the sample edge has no clear demarcation point, and Fuzzy C-Means could improve this problem when dealing with data.On this basis, introducing of spatial variation regularity.Without prejudice to the classification results, the better management area is divided.Combined with the results of the above analysis, using MATLAB visualization toolbox for data processing.The results obtained in Fig. 2 and Table 3.The table 3 and figure 2 show that after years of continuous precise fertilization, the similar degree of data is improved in gradually, the discrepancy between categories gradually become smaller, the soil fertility difference is leveling off.All above shows that after precise fertilization, the plot of soil in Alkeline-N, Olsen-P and Olsen-K three nutrients data integrated similarity increased year by year; On the other hand also proved that the attribute weights are C clustering algorithm is suitable for evaluation of soil fertility.

Conclusion
The attribute weights fuzzy c-means algorithm is used to analysis and evaluation for Nongan county Nongan town soil nutrient data for five consecutive years (2008 -2012).The test results show that after five consecutive years of precise fertilization, soil fertility condition had the obvious change.The attribute weights c-means clustering algorithm is an effective methods of research and evaluation of the soil fertility, in line with the farmers, the change trend of soil fertility.
Firstly, the algorithm consider the spatial variability of soil fertility, combined with AHP to determine the sensitive attribute weights.the original scattered data not only retain the traditional algorithms consider the problem of difficult to deal with the boundary points by using the concept of fuzzy sets, and to overcome the imbalance between the various properties and is sensitive to "noise" and outlier data shortcomings.
Secondly, the paper used the sensitive attribute weights fuzzy c-means algorithm to do the clustering analysis for Nongan soil data in 2011 which included the soil alkaline hydrolysis nitrogen, available phosphorus and available potassium three nutrients data.The results show that the algorithm was 21.7% higher than that of traditional algorithm of relative accuracy and efficiency increased by 17%, the improved algorithm clustering effect is better.
Thirdly, using the algorithm to analysis soil nutrient data which precision fertilization consecutive for five years, the results show that the whole plot soil in alkali solution nitrogen, available phosphorus and available potassium in three kinds of nutrient data integrated similarity increased with each passing year.The results of the experiment are consistent with the actual situation of soil fertility, which provides a new reference for the analysis of the status of soil fertility in the future.
Fuzzy clustering is a rather ambiguous concept, the two clustering algorithms should be repeated iterations based on the exponent value of the objective function membership degree, so as to determine the relatively close to the true clustering value.We know MATLAB can handle large-scale data, and the formation of the visual clustering results.Covered in this article the agricultural data mostly a single soil nutrient data.Face of the growing complexity of massive agricultural data, the original matrix processing mode is not enough.In the future, attention should be application testing large data sets, in order to confirm the validity of the algorithm of massive data clustering.

Fig. 2
Fig.2 The three dimensional clustering figure

Table 1 .
the sample data

Table 3 .
The clustering results