Local Component Analysis - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2011

Local Component Analysis

Résumé

Kernel density estimation, a.k.a. Parzen windows, is a popular density
estimation method, which can be used for outlier detection or clustering.
With multivariate data, its performance is heavily reliant on the metric
used within the kernel. Most earlier work has focused on learning only the
bandwidth of the kernel (i.e., a scalar multiplicative factor). In this
paper, we propose to learn a full Euclidean metric through an
expectation-minimization (EM) procedure, which can be seen as an
unsupervised counterpart to neighbourhood component analysis (NCA). In order to
avoid overfitting with a fully nonparametric density estimator in high
dimensions, we also consider a semi-parametric Gaussian-Parzen density
model, where some of the variables are modelled through a jointly Gaussian
density, while others are modelled through Parzen windows. For these two
models, EM leads to simple closed-form updates based on matrix inversions and
eigenvalue decompositions. We show empirically that our method leads to
density estimators with higher test-likelihoods than natural competing
methods, and that the metrics may be used within most unsupervised learning
techniques that rely on such metrics, such as spectral clustering or manifold
learning methods. Finally, we present a stochastic approximation scheme which allows for the use of this method in a large-scale setting.
Fichier principal
Vignette du fichier
lca_arxiv.pdf (2.23 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

inria-00617965 , version 1 (31-08-2011)
inria-00617965 , version 2 (01-09-2011)
inria-00617965 , version 3 (27-09-2011)
inria-00617965 , version 4 (10-12-2012)

Identifiants

  • HAL Id : inria-00617965 , version 1
  • ARXIV : 1109.0093

Citer

Nicolas Le Roux, Francis Bach. Local Component Analysis. 2011. ⟨inria-00617965v1⟩
503 Consultations
872 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More