Skip to Main content Skip to Navigation

Multimodal Structuring of Tennis Videos using Segment Models

Emmanouil Delakis 1
1 TEXMEX - Multimedia content-based indexing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : Automatic video content analysis is an emerging research subject with numerous practical applications to large video databases or personal video recording systems. The focus of this study is the automatic construction of the table of contents of a tennis broadcast using Markovian models and dynamic programming. Motivated by the need for more efficient multimodal representations, the use of segmental features in the framework of Segment Models is proposed, instead of the frame-based features of Hidden Markov Models. Considering each scene of the video as a segment, the synchronization points between different modalities are extended to the scene boundaries, which is the basic thematic unit of the video. Visual features coming from the produced broadcasted video and auditory features recorded in the court are processed before fusion in their own segments, with their own sampling rates and models. Various techniques for modeling the segments are examined, including discrete or continuous density Hidden Markov Models, bigram models or connectionist scorers, operating on automatically extracted audiovisual features. Segment Models and Hidden Markov Models, with hierarchical or ergodic topolo- gies, are built and compared in a corpus of 15 hours tennis video. The model parameters are estimated on labeled data. Depending on the segmental scorer employed, asynchronous fusion with Segment Models can achieve the same level of performance as Hidden Markov Models. The fusion of the textual resources of the video, namely the score announcements, is also considered. To fully exploit their semantic content on the actual game evolution and to account for nacknowledged game events, a novel Viterbi decoding scheme is developed. It produces solutions that are consistent with the score announcements and thus yields a clear performance improvement of the system.
Document type :
Complete list of metadata

Cited literature [119 references]  Display  Hide  Download
Contributor : Patrick Gros <>
Submitted on : Thursday, October 7, 2010 - 2:12:04 PM
Last modification on : Tuesday, June 15, 2021 - 4:14:05 PM
Long-term archiving on: : Monday, January 10, 2011 - 11:27:46 AM


  • HAL Id : tel-00524285, version 1


Emmanouil Delakis. Multimodal Structuring of Tennis Videos using Segment Models. Human-Computer Interaction [cs.HC]. Université Rennes 1, 2006. English. ⟨tel-00524285⟩



Record views


Files downloads