Audiovisual Integration with Segment Models for Tennis Video Parsing

Emmanouil Delakis 1 Guillaume Gravier 2, * Patrick Gros 1
* Auteur correspondant
1 TEXMEX - Multimedia content-based indexing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
2 METISS - Speech and sound data modeling and processing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : Automatic video content analysis is an emerging research subject with numerous applications to large video databases and personal video recording systems. The aim of this study is to fuse multimodal information in order to automatically parse the underlying structure of tennis broadcasts. The frame-based observation distributions of Hidden Markov Models are too strict in modeling heterogeneous audiovisual data. We propose instead the use of segmental features, of the framework of Segment Models, to overcome this limitation and extend the synchronization points to the segment boundaries. Considering each segment as a video scene, auditory and visual features collected inside the scene boundaries can thus be sampled and modeled with their native sampling rates and models. Experimental results on a corpus of 15-h tennis video demonstrated a performance superiority of Segment Models with synchronous audiovisual fusion over Hidden Markov Models. Results though with asynchronous fusion are less optimistic.
Liste complète des métadonnées

https://hal.inria.fr/inria-00568073
Contributeur : Patrick Gros <>
Soumis le : mardi 22 février 2011 - 15:57:59
Dernière modification le : jeudi 11 janvier 2018 - 06:20:10

Identifiants

Collections

Citation

Partager

Métriques

Consultations de la notice

215