High-Dimensional Data Clustering

Charles Bouveyron; Stéphane Girard; Cordelia Schmid

doi:10.1016/j.csda.2007.02.009

Journal Articles Computational Statistics and Data Analysis Year : 2007

High-Dimensional Data Clustering

(1, 2) , (2) , (3)

1
2
3

Charles Bouveyron

Function : Author
PersonId : 347
IdHAL : charles-bouveyron
ORCID : 0000-0002-6956-4491
IdRef : 112244785

Department of Mathematics & Statistics

Modelling and Inference of Complex and Structured Stochastic Systems

Stéphane Girard

Function : Author
PersonId : 6170
IdHAL : stephane-girard
ORCID : 0000-0003-0098-2369
IdRef : 112225497

Modelling and Inference of Complex and Structured Stochastic Systems

Cordelia Schmid

Function : Author
PersonId : 831154

Learning and recognition in vision

Abstract

Clustering in high-dimensional spaces is a difficult problem which is recurrent in many domains, for example in image analysis. The difficulty is due to the fact that high-dimensional data usually live in different low-dimensional subspaces hidden in the original space. This paper presents a family of Gaussian mixture models designed for high-dimensional data which combine the ideas of dimension reduction and parsimonious modeling. These models give rise to a clustering method based on the Expectation-Maximization algorithm which is called High-Dimensional Data Clustering (HDDC). In order to correctly fit the data, HDDC estimates the specific subspace and the intrinsic dimension of each group. Our experiments on artificial and real datasets show that HDDC outperforms existing methods for clustering high-dimensional data

Keywords

parsimonious models Model-based clustering high-dimensional data Gaussian mixture models subspace selection dimension reduction parsimonious models.

Domains

Statistics [math.ST] Statistics Theory [stat.TH]

Fichier principal

RR-1083M.pdf (332.71 Ko)

Origin : Files produced by the author(s)

Charles Bouveyron : Connect in order to contact the contributor

https://hal.science/hal-00022183

Submitted on : Thursday, January 4, 2007-8:18:57 PM

Last modification on : Thursday, April 4, 2024-9:41:19 PM

Long-term archiving on: Friday, September 24, 2010-10:49:28 AM

Dates and versions

hal-00022183 , version 1 (04-04-2006)

hal-00022183 , version 2 (18-04-2006)

hal-00022183 , version 3 (21-12-2006)

hal-00022183 , version 4 (04-01-2007)

Identifiers

HAL Id : hal-00022183 , version 4
ARXIV : math.ST/0604064
DOI : 10.1016/j.csda.2007.02.009

Cite

Charles Bouveyron, Stéphane Girard, Cordelia Schmid. High-Dimensional Data Clustering. Computational Statistics and Data Analysis, 2007, 52 (1), pp.502-519. ⟨10.1016/j.csda.2007.02.009⟩. ⟨hal-00022183v4⟩

Export

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 UGA CNRS INRIA IRISA LJK LJK_GI LJK_PS LJK_GI_LEAR LJK_PS_MISTIS INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

1483 View

3107 Download

High-Dimensional Data Clustering

Abstract

Keywords

Domains

Dates and versions

Identifiers

Cite

Export

Collections

Altmetric

Share