Escaping the Curse of Dimensionality in Similarity Learning: Efficient Frank-Wolfe Algorithm and Generalization Bounds

Kuan Liu; Aurélien Bellet

Rapport (Rapport De Recherche) Année : 2018

Escaping the Curse of Dimensionality in Similarity Learning: Efficient Frank-Wolfe Algorithm and Generalization Bounds

(1) , (2)

1
2

Kuan Liu

Fonction : Auteur

Information Sciences Institute [California]

Aurélien Bellet

Fonction : Auteur
PersonId : 9877
IdHAL : aurelien-bellet
ORCID : 0000-0003-3440-1251
IdRef : 17653136X

Machine Learning in Information Networks

Résumé

Similarity and metric learning provides a principled approach to construct a task-specific similarity from weakly supervised data. However, these methods are subject to the curse of dimensionality: as the number of features grows large, poor generalization is to be expected and training becomes intractable due to high computational and memory costs. In this paper, we propose a similarity learning method that can efficiently deal with high-dimensional sparse data. This is achieved through a parameterization of similarity functions by convex combinations of sparse rank-one matrices, together with the use of a greedy approximate Frank-Wolfe algorithm which provides an efficient way to control the number of active features. We show that the convergence rate of the algorithm, as well as its time and memory complexity, are independent of the data dimension. We further provide a theoretical justification of our modeling choices through an analysis of the generalization error, which depends logarithmically on the sparsity of the solution rather than on the number of features. Our experiments on datasets with up to one million features demonstrate the ability of our approach to generalize well despite the high dimensionality as well as its superiority compared to several competing methods.

Domaines

Apprentissage [cs.LG] Machine Learning [stat.ML]

Fichier principal

1807.07789.pdf (940.64 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Aurélien Bellet : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01923006

Soumis le : mercredi 14 novembre 2018-19:23:04

Dernière modification le : mercredi 24 janvier 2024-09:54:24

Archivage à long terme le : vendredi 15 février 2019-16:56:46

Dates et versions

hal-01923006 , version 1 (14-11-2018)

Identifiants

HAL Id : hal-01923006 , version 1

Citer

Kuan Liu, Aurélien Bellet. Escaping the Curse of Dimensionality in Similarity Learning: Efficient Frank-Wolfe Algorithm and Generalization Bounds. [Research Report] Inria. 2018. ⟨hal-01923006⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA CRISTAL INRIA2 CRISTAL-MAGNET LARA UNIV-LILLE

56 Consultations

49 Téléchargements

Escaping the Curse of Dimensionality in Similarity Learning: Efficient Frank-Wolfe Algorithm and Generalization Bounds

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager