Stochastic Models for Multimodal Video Analysis
Résumé
This chapter presents video indexing with segment models (SM), aiming at a more efficient and versatile multimodal fusion. In segment models, synchrony constraints between modalities can be relaxed to the scene boundaries, thus enabling to process each modality with their native sampling rates and models within each scene. We illustrate the many possibilities of audiovisual integration that SM can offer in the context of tennis video structuring. We first briefly review stochastic models that have been used for multimodal video analysis. We then present the task of tennis video structuring and the cues and related features that we want to incorporate in a stochastic model. We show how HMM can be used for multimodal integration before generalizing the HMM approach based on the segment model framework. We finally show that the hierarchical structure of a tennis video can be taken into consideration in both frameworks and present a new decoding algorithm to take into account textual score information displayed on screen.