Audio-visual Tracking by Density Approximation in a Sequential Bayesian Filtering Framework

Israel Gebru 1 Christine Evers 2 Patrick Naylor 2 Radu Horaud 1
1 PERCEPTION - Interpretation and Modelling of Images and Videos
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann, INPG - Institut National Polytechnique de Grenoble
Abstract : This paper proposes a novel audiovisual tracking approach that exploits constructively audio and visual modalities in order to estimate trajectories of multiple people in a joint state space. The tracking problem is modeled using a sequential Bayesian filtering framework. Within this framework, we propose to represent the posterior density with a Gaussian Mixture Model (GMM). To ensure that a GMM representation can be retained sequentially over time, the predictive density is approximated by a GMM using the Unscented Transform. While a density interpolation technique is introduced to obtain a continuous representation of the observation likelihood, which is also a GMM. Furthermore, to prevent the number of mixtures from growing exponentially over time, a density approximation based on the Expectation Maximization (EM) algorithm is applied, resulting in a compact GMM representation of the posterior density. Recordings using a camcorder and microphone array are used to evaluate the proposed approach, demonstrating significant improvements in tracking performance of the proposed audiovisual approach compared to two benchmark visual trackers.
Type de document :
Communication dans un congrès
IEEE Workshop on Hands-free Speech Communication and Microphone Arrays, Mar 2017, San Francisco, CA, United States. IEEE, pp.71-75, 2017, 〈http://hscma2017.org/〉. 〈10.1109/HSCMA.2017.7895564〉
Liste complète des métadonnées

https://hal.inria.fr/hal-01452167
Contributeur : Team Perception <>
Soumis le : vendredi 17 mars 2017 - 15:07:06
Dernière modification le : jeudi 24 mai 2018 - 17:12:37
Document(s) archivé(s) le : dimanche 18 juin 2017 - 13:28:21

Fichiers

main.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Israel Gebru, Christine Evers, Patrick Naylor, Radu Horaud. Audio-visual Tracking by Density Approximation in a Sequential Bayesian Filtering Framework. IEEE Workshop on Hands-free Speech Communication and Microphone Arrays, Mar 2017, San Francisco, CA, United States. IEEE, pp.71-75, 2017, 〈http://hscma2017.org/〉. 〈10.1109/HSCMA.2017.7895564〉. 〈hal-01452167〉

Partager

Métriques

Consultations de la notice

674

Téléchargements de fichiers

274