Audio Event Detection in Movies using Multiple Audio Words and Contextual Bayesian Networks

Cédric Penet 1, 2 Claire-Hélène Demarty 1 Guillaume Gravier 1, 2 Patrick Gros 1, 2
2 TEXMEX - Multimedia content-based indexing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : This article investigates a novel use of the well known audio words representations to detect specific audio events, namely gunshots and explosions, in order to get more robustness towards soundtrack variability in Hollywood movies. An audio stream is processed as a sequence of stationary segments. Each segment is described by one or several audio words obtained by applying product quantization to standard features. Such a representation using multiple audio words constructed via product quantisation is one of the novelties described in this work. Based on this representation, Bayesian networks are used to exploit the contextual information in order to detect audio events. Experiments are performed on a comprehensive set of 15 movies, made publicly available. Results are comparable to the state of the art results obtained on the same dataset but show increased robustness to decision thresholds, however limiting the range of possible operating points in some conditions. Late fusion provides a solution to this issue.
Type de document :
Communication dans un congrès
CBMI - 11th International Workshop on Content Based Multimedia Indexing - 2013, Jun 2013, Veszprém, Hungary. 2013
Liste complète des métadonnées

Littérature citée [14 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-00822022
Contributeur : Cédric Penet <>
Soumis le : lundi 13 mai 2013 - 18:09:26
Dernière modification le : jeudi 11 janvier 2018 - 06:20:10
Document(s) archivé(s) le : mercredi 14 août 2013 - 05:50:07

Fichier

CBMI2013_CedricPENET_CameraRea...
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00822022, version 1

Collections

Citation

Cédric Penet, Claire-Hélène Demarty, Guillaume Gravier, Patrick Gros. Audio Event Detection in Movies using Multiple Audio Words and Contextual Bayesian Networks. CBMI - 11th International Workshop on Content Based Multimedia Indexing - 2013, Jun 2013, Veszprém, Hungary. 2013. 〈hal-00822022〉

Partager

Métriques

Consultations de la notice

927

Téléchargements de fichiers

332