Skip to Main content Skip to Navigation
Journal articles

Comparing Two Clusterings Using Matchings between Clusters of Clusters

Abstract : Clustering is a fundamental problem in data science, yet, the variety of clusteringmethods and their sensitivity to parameters make clustering hard. To analyze the stability of agiven clustering algorithm while varying its parameters, and to compare clusters yielded by differentalgorithms, several comparison schemes based on matchings, information theory and various indices(Rand, Jaccard) have been developed. We go beyond these by providing a novel class of methodscomputing meta-clusters within each clustering– a meta-cluster is a group of clusters, togetherwith a matching between these.Let the intersection graph of two clusterings be the edge-weighted bipartite graph in which thenodes represent the clusters, the edges represent the non empty intersection between two clus-ters, and the weight of an edge is the number of common items. We introduce the so-calledD-family-matching problem on intersection graphs, withDthe upper-bound on the diameter ofthe graph induced by the clusters of any meta-cluster. First we prove NP-completeness resultsand unbounded approximation ratio of simple strategies. Second, we design exact polynomial timedynamic programming algorithms for some classes of graphs (in particular trees). Then, we provespanning-tree based efficient algorithms for general graphs.Our experiments illustrate the role ofDas a scale parameter providing information on the rela-tionship between clusters within a clustering and in-between two clusterings. They also show theadvantages of our built-in mapping over classical cluster comparison measures such as the variationof information (VI)
Document type :
Journal articles
Complete list of metadata
Contributor : Frederic Cazals <>
Submitted on : Monday, December 30, 2019 - 7:23:04 PM
Last modification on : Thursday, January 21, 2021 - 2:32:02 PM

Links full text



Frédéric Cazals, Dorian Mazauric, Romain Tetley, Rémi Watrigant. Comparing Two Clusterings Using Matchings between Clusters of Clusters. ACM Journal of Experimental Algorithmics, Association for Computing Machinery, 2019, 24 (1), pp.1-41. ⟨10.1145/3345951⟩. ⟨hal-02425599⟩



Record views