p-Spectral Clustering Based on Neighborhood Attribute Granulation

. Clustering analysis is an important method for data mining and information statistics. Data clustering is to find the intrinsic links between objects and describe the internal structures of data sets. p -Spectral clustering is based on Cheeger cut criterion. It has good performance on many challenging data sets. But the original p -spectral clustering algorithm is not suitable for high-dimensional data. To solve this problem, this paper improves p -spectral clustering using neighborhood attribute granulation and proposes NAG-pSC algorithm. Neighborhood rough sets can directly process the continuous data. We introduce information entropy into the neighborhood rough sets to weaken the negative impact of noise data and redundant attributes on clustering. In this way, the data points within the same cluster are more compact, while the data points between different clusters are more separate. The effectiveness of the proposed NAG-pSC algorithm is tested on several benchmark data sets. Experiments show that the neighborhood attribute granulation will highlight the differences between data points while maintaining their characteristics in the clustering. With the help of neighborhood attribute granulation, NAG-pSC is able to recognize more complex data structures and has strong robustness to the noise or irrelevant features in high-dimensional data.


Introduction
Spectral clustering treats clustering problem as a graph partitioning problem.It can solve the graph cut objective function using the eigenvectors of graph Laplacian matrix [1].Compared with the conventional clustering algorithms, spectral clustering is able to recognize more complex data structures, especially suitable for non-convex data sets.Recently, an improved version of normalized cut named Cheeger cut has aroused much attention [2].Research shows that Cheeger cut is able to produce more balanced clusters through graph p-Laplacian matrix [3].p-Laplacian matrix is a nonlinear generalization form of graph Laplacian.
p-spectral clustering is based on Cheeger cut to group data points.As it has solid theoretical foundation and good clustering results, the research in this area is very active at present.Dhanjal et al. present an incremental spectral clustering which updates the eigenvectors of the Laplacian in a computationally efficient way [4].Gao et al. construct the sparse affinity graph on a small representative dataset and use local interpolation to improve the extension of the clustering results [5].Semertzidis et al. inject the pairwise constraints to a small affinity sub-matrix and use a sparse coding strategy of a landmark spectral clustering to preserve low complexity [6].
Nowadays, science and technology is growing by leaps and bounds and massive data result in "data explosion".These data are often accompanied by high dimensions.When dealing with high-dimensional data, some clustering algorithms that perform well in low-dimensional data space are often unable to get good clustering results, and even invalid [7].Attribute reduction is an effective way to decrease the size of data, and it is often used as a preprocessing step for data mining.The essence of attribute reduction is to remove irrelevant or unnecessary attributes while maintaining the classification ability of knowledge base.Efficient attribute reduction not only can improve the knowledge clarity in intelligent information systems, but also reduce the cost of information systems to some extent.In order to effectively deal with highdimensional data, we design a novel attribute reduction method based on neighborhood granulation and combine it with p-spectral clustering.The proposed algorithm inherits the advantages of neighborhood rough set and graph p-Laplacian.Its effectiveness is demonstrated by comprehensive experiments on benchmark data sets.
This paper is organized as follows: section 2 introduces p-spectral clustering; section 3 uses information entropy to improve the attribute reduction based on neighborhood rough sets; section 4 improves p-spectral clustering with the neighborhood attribute granulation; section 5 verifies the effectiveness of the proposed algorithm using benchmark data sets; finally, we summarize the main contribution of this paper.

p-Spectral Clustering
The idea of spectral clustering comes from spectral graph partition theory.Given a data set, we can construct an undirected weighted graph G=(V,E), where V is the set of vertices represented by data points, E is the set of edges weighted by the similarities between the edge's two vertices.Suppose A is a subset of V, the complement of A is written as \ A V A  .The cut of A and A is defined as: , ( , ) where w ij is the similarity between vertex i and vertex j.
In order to get more balanced clusters, Cheeger et al. propose Cheeger cut criterion, denoted as Ccut [8]: where A is the number of data points in set A. Cheeger cut is to minimize formula (2) to get a graph partition.The optimal graph partition means that the similarities within a cluster are as large as possible, while the similarities between clusters are as small as possible.But according to the Rayleigh quotient principle, calculating the optimal Cheeger cut is an NP-hard problem.Next we will try to get an approximate solution of Cheeger cut by introducing p-Laplacian into spectral clustering.Hein et al. define the inner product form of graph p-Laplacian Δ p as follows [9]: where p ∈ (1,2], f is the eigenvector of p-Laplacian matrix.Theorem 1.For p > 1 and every partition of V into A, A there exists a function (f,A) such that the functional F p associated to the p-Laplacian satisfies where The expression (4) can be interpreted as a balanced graph cut criterion, and we have the special cases Theorem 1 shows that Cheeger cut can be solved in polynomial time using p-Laplacian operator.So the solution of F p (f) is a relaxed approximate solution of Cheeger cut and the optimal solution can be obtained by the eigen-decomposition of p-Laplacian: where λ p is the eigenvalue corresponding to eigenvector f.Specifically, the second eigenvector (2)   p v of p-Laplacian matrix will lead to a bipartition of the graph by setting an appropriate threshold [3].The optimal threshold is determined by minimizing the corresponding Cheeger cut.For the second eigenvector

Neighborhood Attribute Granulation
Rough set theory is proposed by professor Pawlak in 1982 [10].Attribute reduction is one of the core contents of rough set knowledge discovery.However, Pawlak rough set is only suitable for discrete data.
According to the nature of lower approximation, we can define the dependence of decision attribute D on condition attribute B: () () () where 0 ( ) 1 B D   .Obviously, the greater the positive region B ND , the stronger the dependence of decision D on condition B. Definition 3. Given a neighborhood decision system ,, a A B    , then the significant degree of a relative to B is defined as: ( ) However, sometimes several attributes may have the same greatest importance degree.Traditional reduction algorithms take the approach of randomly choosing one of the attributes, which is obviously arbitrary does not taking into account the impact of other factors on attribute selection and may lead to poor reduction results.From the viewpoint of information theory to analyze attribute reduction can improve the reduction accuracy [12].Here, we use information entropy as another criterion to evaluate attributes.The definition of entropy is given below.Definition 4. Given knowledge P and its partition 12 / { , , , } n U P X X X  exported on domain U.The information entropy of knowledge P is defined as: 1 ( ) ( ) log ( ) where ( ) / ii p X X U  represents the probability of equivalence class X i on domain U.
If multiple attributes have the same greatest importance degree, then we may compare their information entropy and select the attribute with the minimum entropy (because it carries the least uncertain information).Incorporate the selected attribute into the reduction set, and repeat this process for each attribute until the reduction set no longer changes.This improved attribute reduction algorithm is shown as Algorithm 1.
Algorithm 1. Neighborhood attribute granulation with information entropy.Input: Neighborhood decision system ,, NDT U A D  .

Output:
The reduced attribute set red. Step

p-Spectral Clustering Based on Neighborhood Attribute Granulation
Massive high-dimensional data processing has been a challenge problem in data mining.High-dimensional data is often accompanied by the "curse of dimensionality", so traditional p-spectral clustering algorithms cannot play to their strengths very well.Moreover, real data sets often contain noise and irrelevant features, likely to cause "dimension trap".It would interfere with the clustering process of algorithms, affecting the accuracy of clustering results [13].To solve this problem, we propose a novel pspectral clustering algorithm based on neighborhood attribute granulation (NAG-pSC).
The detailed steps of NAG-pSC algorithm is given in Algorithm 2. Algorithm 2. p-Spectral clustering algorithm based on neighborhood attribute granulation. Input: , the cluster number k. Output: k divided clusters.
Step 1. Reduce the attributes of data points according to Algorithm 1 and obtain the reduced attribute set red.
Step 2. Affter attribute granulation, calculate the similarities between data points based on the new data set red, and form the affinity matrix nn W   using Gaussian kernel.
Step 3. Initialize the first cluster 1 AV  and set the cluster number s = 1.Step 4. Repeat from Step 4 to Step 8.
Step 5. Construct p-Laplacian matrix according to formula [3] with the affinity matrix W.
Step 6. Calculate the second eigenvector (2)   p v of graph p-Laplacian p  , and search an appropriate threshold value that satisfies formula [7].
Step 9. When the number of clusters sk  , stop the loop and output the clustering results.

Experimental Analysis
To test the effectiveness of the proposed NAG-pSC algorithm, we use six benchmark data sets to do the experiments.The characteristics of these data sets are shown in Table 1.
Table 1 In this paper, we use F-measure to evaluate the merits of clustering results [14].The F-score of each class i and the total F index of the clustering results are defined as: 2 ( ) ( ) () ( ) ( ) ii i R i N N  is the recall rate; N ii* is the size of the intersection of class i and cluster i*; N i is the size of class i; N i* is the size of cluster i*; n is the number of data points; k is the class number; N i is the size of class i.
[0,1] F  , the greater the F index is, means the clustering results of the algorithm is closer to the real data category.
In the experiment, NAG-pSC algorithm is compared with the traditional spectral clustering (SC), density sensitive spectral clustering (D-SC) [1] and p-spectral clustering (pSC) [3].The threshold δ is important in neighborhood rough set.Hu et al. recommend a value range [0.2, 0.4] of δ based on experimental analysis [11].So we set the neighborhood size δ via a cross-validatory search in the range [0.2, 0.4] (with step size 0.05) for each data set.The clustering results of these four algorithms are shown in Figure 1.The horizontal axis of the figure is the cluster label, and the vertical axis is the F-score of each cluster.From Figure 1 we can see that, the performance of SC algorithm is close to D-SC algorithm.This is mainly because that they all based on graph theory and turn the clustering problem into a graph partitioning problem.Using the p-Laplacian transform, pSC may find the global optimum solution.SC works well on Sonar data set.D-SC deals well with Colon Cancer data set.pSC can generate balanced clusters on WDBC data set.But for high dimensional clustering problems, their F-scores are lower than the proposed NAG-pSC algorithm.Because the information in each attribute of the instances is different, and they also make different contributions to the clustering.Improper feature selection would cause a greate impact on the clustering results.Traditional spectral clustering algorithm does not take this into account, susceptible to the interference of noise and irrelevant attributes.For further comparison, Table 2 lists the overall F index for each algorithm and the number of condition attributes of different data sets.Table 2 shows that NAG-pSC algorithm can well deal with high-dimensional data.NAG-pSC algorithm uses neighborhood rough sets to optimize data instances.The neighborhood attribute reduction based on information entropy diminishes the negative impact of noise data and redundant attributes on the clustering.So in most cases, NAG-pSC algorithm has higher clustering accuracy.NAG-pSC algorithm combines the advantages of p-spectral clustering and neighborhood attribute granulation.It has good robustness and strong generalization ability.

Conclusions
To improve the performance of p-spectral clustering on high-dimensional data, we modify the attribute reduction method based on neighborhood rough sets.In the new method, the attribute importance is combined with information entropy to select the appropriate attributes.Then we propose NAG-pSC algorithm based on the optimized attribute reduction set.Experiments show that NAG-pSC algorithm is superior to traditional spectral clustering, density sensitive spectral clustering and p-spectral clustering.In the future, we will study how to apply NAG-pSC algorithm to web data mining, image retrieval and other realistic scenes.

v
of graph p-Laplacian Δ p , the threshold should satisfy:

Figure 1 .
Clustering results on different datasets [11]olve this problem, Hu et al. propose neighborhood rough set model[11].This model can directly analyze the attributes with continuous values.Therefore, it has great advantages in feature selection and classification accuracy.
. Data sets used in the experiments

Table 2 .
Total F index of different algorithms