Similarity by diverting supervised machine learning — Application to knowledge discovery in multimedia content

Amélie Royer 1 Vincent Claveau 1 Guillaume Gravier 1 Teddy Furon 1
1 LinkMedia - Creating and exploiting explicit links between multimedia fragments
Inria Rennes – Bretagne Atlantique , IRISA-D6 - MEDIA ET INTERACTIONS
Abstract : Knowledge discovery is the task of extracting new information from large databases, such as recurrent patterns or structural cues. In this framework, cluster analysis refers to the sub-domain dealing with partitioning a given data space such that two samples in the same cluster are similar, while those in di fferent ones are not. Clustering algorithms exploit an input similarity measure on the samples, which should be fi ne-tuned with the data format and the application athand. However, manually de ning a suitable similarity measure is a difficult task in case of limited prior knowledge or complex data structures for example.The purpose of this internship is to investigate an approach for automatically building such a measure by taking advantage of the discriminative abilities of state-of-the-art classi cation techniques. While classi cation systems usually require a set of samples annotated with their ground-truth classes, recent work has shown it is possible to exploit classi ers trained on an arti cial annotation of the data in order to induce a similarity measure. In this report, after introducing related scienti fic background, we propose a uni fied framework, SIC (Similarity by Iterative Classi cations), which explores the idea of diverting supervised learning for automatic similarity inference. We study several of its theoretical and practical aspects. We also implement and evaluate SIC on three tasks of knowledge discovery on multimedia content. Results show that in most situations the proposed approach indeed bene fits from the underlying classi er's properties and outperforms usual similarity measures for clustering applications.
Complete list of metadatas

Cited literature [36 references]  Display  Hide  Download

https://hal.inria.fr/hal-01285965
Contributor : Vincent Claveau <>
Submitted on : Thursday, March 10, 2016 - 11:53:39 AM
Last modification on : Thursday, February 7, 2019 - 4:52:03 PM
Long-term archiving on : Sunday, November 13, 2016 - 1:46:55 PM

File

RR-8880.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01285965, version 1

Citation

Amélie Royer, Vincent Claveau, Guillaume Gravier, Teddy Furon. Similarity by diverting supervised machine learning — Application to knowledge discovery in multimedia content. [Research Report] RR-8880, Inria Rennes Bretagne Atlantique; UMR IRISA. 2015. ⟨hal-01285965⟩

Share

Metrics

Record views

5314

Files downloads

181