Alignment of Binocular-Binaural Data Using a Moving Audio-Visual Target

Vasil Khalidov 1 Florence Forbes 2 Radu Horaud 3
2 MISTIS - Modelling and Inference of Complex and Structured Stochastic Systems
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann, INPG - Institut National Polytechnique de Grenoble
3 PERCEPTION - Interpretation and Modelling of Images and Videos
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann, INPG - Institut National Polytechnique de Grenoble
Abstract : In this paper we address the problem of aligning visual (V) and auditory (A) data using a sensor that is composed of a camera-pair and a microphone-pair. The original contribution of the paper is a method for AV data aligning through estimation of the 3D positions of the microphones in the visual-centred coordinate frame defined by the stereo camera-pair. We exploit the fact that these two distinct data sets are conditioned by a common set of parameters, namely the (unknown) 3D trajectory of an AV object, and derive an EM-like algorithm that alternates between the estimation of the microphone-pair position and the estimation of the AV object trajectory. The proposed algorithm has a number of built-in features: it can deal with A and V observations that are misaligned in time, it estimates the reliability of the data, it is robust to outliers in both modalities, and it has proven theoretical convergence. We report experiments with both simulated and real data.
Type de document :
Communication dans un congrès
MMSP 2013 - IEEE International Workshop on Multimedia Signal Processing, Sep 2013, Pula (Sardinia), Italy. IEEE, pp.242-247, 2013, 〈10.1109/MMSP.2013.6659295〉
Liste complète des métadonnées


https://hal.inria.fr/hal-00861482
Contributeur : Team Perception <>
Soumis le : vendredi 4 octobre 2013 - 17:14:26
Dernière modification le : vendredi 24 novembre 2017 - 13:29:12
Document(s) archivé(s) le : vendredi 7 avril 2017 - 06:53:11

Identifiants

Citation

Vasil Khalidov, Florence Forbes, Radu Horaud. Alignment of Binocular-Binaural Data Using a Moving Audio-Visual Target. MMSP 2013 - IEEE International Workshop on Multimedia Signal Processing, Sep 2013, Pula (Sardinia), Italy. IEEE, pp.242-247, 2013, 〈10.1109/MMSP.2013.6659295〉. 〈hal-00861482v3〉

Partager

Métriques

Consultations de la notice

789

Téléchargements de fichiers

346