Similarity by diverting supervised machine learning — Application to knowledge discovery in multimedia content - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Rapport (Rapport De Recherche) Année : 2015

Similarity by diverting supervised machine learning — Application to knowledge discovery in multimedia content

Similarité par détournement de méthodes d'apprentissage supervisées - Application à la découverte de connaissances dans les contenus multimédias

Résumé

Knowledge discovery is the task of extracting new information from large databases, such as recurrent patterns or structural cues. In this framework, cluster analysis refers to the sub-domain dealing with partitioning a given data space such that two samples in the same cluster are similar, while those in di fferent ones are not. Clustering algorithms exploit an input similarity measure on the samples, which should be fi ne-tuned with the data format and the application athand. However, manually de ning a suitable similarity measure is a difficult task in case of limited prior knowledge or complex data structures for example.The purpose of this internship is to investigate an approach for automatically building such a measure by taking advantage of the discriminative abilities of state-of-the-art classi cation techniques. While classi cation systems usually require a set of samples annotated with their ground-truth classes, recent work has shown it is possible to exploit classi ers trained on an arti cial annotation of the data in order to induce a similarity measure. In this report, after introducing related scienti fic background, we propose a uni fied framework, SIC (Similarity by Iterative Classi cations), which explores the idea of diverting supervised learning for automatic similarity inference. We study several of its theoretical and practical aspects. We also implement and evaluate SIC on three tasks of knowledge discovery on multimedia content. Results show that in most situations the proposed approach indeed bene fits from the underlying classi er's properties and outperforms usual similarity measures for clustering applications.
Fichier principal
Vignette du fichier
RR-8880.pdf (9.62 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01285965 , version 1 (10-03-2016)

Identifiants

  • HAL Id : hal-01285965 , version 1

Citer

Amélie Royer, Vincent Claveau, Guillaume Gravier, Teddy Furon. Similarity by diverting supervised machine learning — Application to knowledge discovery in multimedia content. [Research Report] RR-8880, Inria Rennes Bretagne Atlantique; UMR IRISA. 2015. ⟨hal-01285965⟩
1107 Consultations
48 Téléchargements

Partager

Gmail Facebook X LinkedIn More