Incomplete Multi-view Clustering

. Real data often consists of multiple views (or representations). By exploiting complementary and consensus grouping information of multiple views, multi-view clustering becomes a successful practice for boosting clustering accuracy in the past decades. Recently, researchers have begun paying attention to the problem of incomplete view. Generally, they assume at least there is one complete view or only focus on two view problems. However, above assumption is often broken in real tasks. In this work, we propose an IVC algorithm for clustering with more than two incomplete views. Compared with existing works, our proposed al-gorithm (1) does not require any view to be complete, (2) does not limit the number of incomplete views, and (3) can handle similarity data as well as feature data. The proposed algorithm is based on the spectral graph theory and the kernel alignment principle. By aligning projections of individual views with the projection integration of all views, IVC exchanges the complementary grouping information of incomplete views. Consequently, projections of individual views are made complete and thereby resulting the consensus with accurate grouping information. Experiments on synthetic and real datasets demonstrate the eﬀectiveness of IVC.


Introduction
Many datasets in real world are naturally comprised of heterogeneous views (or representations).Clustering with such type of data is commonly referred to as multi-view Clustering.With the assumption of complementary data representation and consensus decision of clusterings, multi-view clustering has the potential to dramatically increase the learning accuracy over single view clustering [1].The main problem in multi-view clustering is how to integrate grouping information of individual views.Existing works can be roughly classified into three categories.
(1) Multi-kernel learning based approach.The most representative work of this category is Multi-kernel Kmeans [2].It first uses kernel representation for each view, and then it incorporates different views by seeking optimal combination of multiple kernels of different views.(2) Subspace learning based approach.It obtains a latent consensus subspace shared by multiple views and cluster the instantces on the latent subspace.There are many research works in this category, including CCA-based methods [3], spectral graph based methods [4][5][6], matrix factorization based methods [7,8].(3) Ensemble learning based approach.[9] takes a decision in each individual view separately and then combines all decisions of distinct views to establish a consensus decision by determining cluster agreements/disagreements.
Traditional research assumes data are complete in all views.However, in many real applications, parts of instances are not available in some views.For example, in a news story clustering task, articles are collected from different on-line news sources.Only a part of news are reported in all views.No single source includes all news.Another example is image clustering.Images are based on multiple visual and textual features.Some images have only a fraction of visual or textual feature sets.
Recently, a few attempts have been made for multi-view clustering with incomplete views.The first work to deal with incomplete view clustering was proposed in [10].It uses one view's kernel representation as the similarity matrix and complete the incomplete view's kernel using Laplacian regularization.However, this approach requires that there exists at least one complete view containing all the instances.Shao et.al [11] relax the above constraint.They collectively complete the kernel matrices of incomplete datasets by optimizing the alignment of shared instances of the datasets.Furthermore, a clustering algorithm is proposed based on the kernel canonical correlation analysis .However, this approach focus on two view problem.It can not exploit relation among more than two views.Li et.al proposed a Partial view clustering algorithm (PVC) [12].Based on non-negative matrix factorization (NMF), PVC works by establishing a latent subspace where the instances corresponding to the same example in different views are close to each other.PVC concentrates on two views problem.Extending PVC to more views suffers from computational problem.In most recently, Shao et.al developed an incomplete view clustering algorithm (MIC) [13].MIC handles the situation of more than two incomplete views.With joint weighted non-negative matrix factorization, it learns a L 2,1 regularized latent subspace for multiple views.With mean value imputation initialization, MIC gives lower weights to the incomplete instances than the complete instances.During optimization, MIC pushes multiple views towards a consensus matrix iteratively.But, there are some limitations about MIC.It converges slowly and contains too many parameters, which makes it difficult to operate.Moreover, both PVC and MIC are NMF based method.Both of them inherit the limitations of NMF: (1) It cannot well deal with data with negative feature values.while in many real applications, the non-negative constraints can not be satisfied.(2) It is essentially linear, and thus cannot disclose non-linear structures hidden in data, which limits its learning ability.(3) It only deals with feature values, while in some applications we know the similarities (relationships) of instances while the detailed feature values are unavailable.Yin et al. [14] proposed a subspace learning algorithm.It utilizes a regression-like objective to learn a latent consensus representation.Besides, it explores the inter-view and intra-view relationship of the data examples by a graph regularization.However, it converges too slowly.It achieves optimal results with about one hundred iterations.This make it difficult to extend to more than two views.
In this paper, we focus on the problem of incomplete view clustering with more than two views.We propose a novel incomplete multi-view clustering (IVC) algorithm.Aiming at completing incomplete views, IVC first integrate inidividual views by collective specrtal decomposition.Then, IVC aligns each individual with the integration respectively.In this way, complementary grouping information is shared among views and missing values of incomplete views are estimated.With estimated individual views, IVC constructs the latent consensus space.At last, clustering solution is obtained by applying the standard spectral clustering on the consensus space.As compared with previous works, the proposed algorithm has several advantages: (1) It does not require any view to be complete.
(2) It does not limit the number of incomplete views.( 3) It can handle similarity data (or kernel data) as well as feature data.( 4) Since it has few parameters to be set, it is easy-implemented.( 5) Due to the non-iterative optimization, it is efficient than most iterative algorithms such as MIC.Moreover, it shows better performance.We demonstrate it in the experiment.
The rest of this paper is organized as follows: In section 2, we give a brief review of the spectral clustering and the kernel alignment principle which is our basis.Section 3 presents details of the proposed algorithm.In section 4, we validate the proposed algorithm.Section 5 concludes the paper.

Preliminary
In this section, we give a brief review of the spectral clustering and the kernel alignment principle, which provide the necessary background and pave the way to the proposed algorithm.

Spectral Clustering
Spectral clustering is a theoretically sound and empirically successful clustering algorithm.It treats clustering as a graph partitioning problem.By making use of the spectrum graph theory, it project original data in a low-dimensional space that contains more discriminative grouping information.Algorithm 1 briefly descibe the spectral clustering algorithm [15] which is the basis of our work.
The equivalent optimization fomular of algorithm 1 is Equation 1. max

Proposed Methods
In this section, we present the detail of the incomplete view clustering (IVC) algorithm.We first describe the IVC framework and present its objectives, and then describe the optimization procedures.

Model Description
Given V incomplete views and the similarity matrices are S (i) , i = 1, 2, ..., V .The cluster number is K. Incomplete views contain different numbers of observed values.In order to make these kernel matrices co-operable (or with the same size N ), we initialize incomplete kernels by filling missing entries with the corresponding average of the column ( i.e. early estimation).First, we exploit the discriminative grouping information of each individual view by spectral decomposition on its similarity matrix S (i) , i = 1, 2, ..., V .max Note that U (i) is a recasted matrix of the original feature matrix.Each row of U (i) is a new representation of an instance with lower dimension and more discriminative grouping information.Next, in order to make different views consistent, we push them towards a latent consensus matrix U * .Because U (i) is a projection of original feature matrix, S (i) = U (i) U (i) T can be seen as a new kernel representation.Similarly, the latent consensus kernel can be decomposed as S * = U * U * T , where U * is the latent projected matrix.Note that U (i) s are derieved by kernels with early estimation.We call U * as early consensus projection.
Borrowing the idea from kernel alignment, we measure the dissimilarity between early consensus and each view by Equation (4).
Minimizing the sum of dissimilarities between early consensus and all individuals, we get objective function (5), where λ i is the tradeoff between different views and expresses the importance of view i in clustering. max Since that max Now, we retransmit the early consensus back to individuals.Specifically, we reorder each individual view as , where U .Then, we update each e by aligning U (i) with U ( * ) .According to Equation (4), we get the objective function (7).max In this way, complementary grouping information is exchanged among incomplete individuals.With updated U (i) s, we construct the final consensus U * f by Equation ( 6).U * f contains more accurate grouping information than U * .At last, we apply standard K-means clustering on U * f to get the final decision.

Model Training
In this subsection, we demonstrate how does IVC optimizes Equation ( 6) and Equation (7).
By the cyclic property of the trace, we transform optimization problem ( 6) into (8), which is equivalent to a standard spectral clustering with graph laplacian v λ v U (v) .U (v) T .The solution of U ( * ) is just the optimal consensus eigen vectors of all individual views.max Transforming and expanding Equation (7) as Equation ( 9), then, taking its derivative w.r.t.
e and setting it to zero, we get the solution as in Equation (10).To the ends, The specific procedure of IVC is summarized in algorithm 2. IVC first initializes incomplete kernels with early estimation.Then, it projects each individual view into a more discriminative space by spectral decomposition.Next, IVC establishes the early consensus projection, and thereby updating individual projections.With these updated individual projections, IVC constructs the final consensus projection.

Comparison methods
We compare the proposed IVC with several state-of-art methods.The details of comparison methods are as follows: different visual features using color, shape, and texture.χ2 distance matrices for different flower features (color, shape, texture) are used as three different views.
Reuters Multilingual Dataset (Reuters) 2 : This dataset contains six samples of 1200 documents, balanced over the 6 labels (E21, CCAT, M11, G-CAT, C15, ECAT).Each sample is made of 5 views (EN, FR, GR, IT, SP) on the same documents.The documents were initially in English, and the FR, GR, IT, and SP views corresponds to the words of their traductions respectively in French, German, Italian and Spanish.
Multi-feature digit Dataset (Mfeat) [18]: This dataset consists of features of handwritten numerals ('0'-'9') extracted from a collection of Dutch utility maps.200 patterns per class (for a total of 2,000 patterns) have been digitized in binary images.These digits are represented in terms of the following five feature sets (files): mfeat-fou, mfeat-fac, mfeat-kar, mfeat-pix, and mfeat-zer.
All original datasets are complete.We simulate incomplete views for them.In specific, we set incomplete ratio from 0% to 90% with 10% as interval.Incomplete instances are distributed evenly in all views.Note that for each instance, it is available in at least one view.

Results
The NMIs of four datasets are plotted in Figure 1.For synthetic data, IVC shows the best NMI.IVC, MIC and Concat preform stable even when the incomplete ratio is close to 90%.While the NMIs of other methods drops sharply as incomplete ratio rises.
For Flowers17, all methods present the downward trends as incomplete ratio increasing.IVC shows relatively better NMI than others.MvSpec is the second best method.Note that MIC shows worst performance.The possible reason is that NMF-based method is not suitable for similarity data.(we apply MIC on kernel data of Flower17 as in original paper [13]).
As Flowers17, similar results for Reuters.IVC demonstrates slight advantage over MIC and more obvious advantage over others.
For Mfeat, in case of low incomplete ratio (i.e. when incomplete ratio is below 20%), all methods except Concate show close NMIs.As the incomplete ratio arises, IVC shows more and more obvious superiority over others.It can be summarized that although views are incomplete, their integration can still be more useful than single complete view.Among above multi-view methods, IVC achieves most accurate clustering for incomplete views in most cases.

Conclusion
In this paper, we propose the IVC algorithm for multiple incomplete view clustering.IVC initializes incomplete views with early estimation.Based on the spectral graph theory, IVC projects original data into a new space with more discriminative grouping information.Then, individual projections are integrated.By aligning individual projections with the projection integration, estimated part of individual projections are updated to be more accurate.With those updated individual projections, final consensus is established and thereby standard K-Means is applied on.Compared with existing works, our proposed algorithm (1) does not require any view to be complete, (2) does not limit the number of incomplete views, and (3) can handle similarity data as well as feature data.Experimental results validate the effectiveness of the IVC algorithm.
by ignoring constant factors and trace property ( trace(AA T ) = A 2 F ), we rewrite the objective function (5) as follows.

a
is the part derived by available (observed) values, while U (i) e is the part derived by estimated (or missing) values.Correspondingly, we reorder U * as U * a U * e

Table 1 .
Details of the datasets