Similarity by diverting supervised machine learning — Application to knowledge discovery in multimedia content

Amélie Royer 1 Vincent Claveau 1 Guillaume Gravier 1 Teddy Furon 1
1 LinkMedia - Creating and exploiting explicit links between multimedia fragments
Inria Rennes – Bretagne Atlantique , IRISA-D6 - MEDIA ET INTERACTIONS
Abstract : Knowledge discovery is the task of extracting new information from large databases, such as recurrent patterns or structural cues. In this framework, cluster analysis refers to the sub-domain dealing with partitioning a given data space such that two samples in the same cluster are similar, while those in di fferent ones are not. Clustering algorithms exploit an input similarity measure on the samples, which should be fi ne-tuned with the data format and the application athand. However, manually de ning a suitable similarity measure is a difficult task in case of limited prior knowledge or complex data structures for example.The purpose of this internship is to investigate an approach for automatically building such a measure by taking advantage of the discriminative abilities of state-of-the-art classi cation techniques. While classi cation systems usually require a set of samples annotated with their ground-truth classes, recent work has shown it is possible to exploit classi ers trained on an arti cial annotation of the data in order to induce a similarity measure. In this report, after introducing related scienti fic background, we propose a uni fied framework, SIC (Similarity by Iterative Classi cations), which explores the idea of diverting supervised learning for automatic similarity inference. We study several of its theoretical and practical aspects. We also implement and evaluate SIC on three tasks of knowledge discovery on multimedia content. Results show that in most situations the proposed approach indeed bene fits from the underlying classi er's properties and outperforms usual similarity measures for clustering applications.
Liste complète des métadonnées
Contributeur : Vincent Claveau <>
Soumis le : jeudi 10 mars 2016 - 11:53:39
Dernière modification le : mercredi 2 août 2017 - 10:06:38
Document(s) archivé(s) le : dimanche 13 novembre 2016 - 13:46:55


Fichiers produits par l'(les) auteur(s)


  • HAL Id : hal-01285965, version 1


Amélie Royer, Vincent Claveau, Guillaume Gravier, Teddy Furon. Similarity by diverting supervised machine learning — Application to knowledge discovery in multimedia content. [Research Report] RR-8880, Inria Rennes Bretagne Atlantique; UMR IRISA. 2015. <hal-01285965>



Consultations de
la notice


Téléchargements du document