Active-Speaker Detection and Localization with Microphones and Cameras Embedded into a Robotic Head

Jan Cech 1 Ravi Mittal 1 Antoine Deleforge 1 Jordi Sanchez-Riera 1 Xavier Alameda-Pineda 1 Radu Horaud 1
1 PERCEPTION - Interpretation and Modelling of Images and Videos
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann, INPG - Institut National Polytechnique de Grenoble
Abstract : In this paper we present a method for detecting and localizing an active speaker, i.e., a speaker that emits a sound, through the fusion between visual reconstruction with a stereoscopic camera pair and sound-source localization with several microphones. Both the cameras and the microphones are embedded into the head of a humanoid robot. The proposed statistical fusion model associates 3D faces of potential speakers with 2D sound directions. The paper has two contributions: (i) a method that discretizes the two-dimensional space of all possible sound directions and that accumulates evidence for each direction by estimating the time difference of arrival (TDOA) over all the microphone pairs, such that all the microphones are used simultaneously and symmetrically and (ii) an audio-visual alignment method that maps 3D visual features onto 2D sound directions and onto TDOAs between microphone pairs. This allows to implicitly represent both sensing modalities into a common audiovisual coordinate frame. Using simulated as well as real data, we quantitatively assess the robustness of the method against noise and reverberations, and we compare it with several other methods. Finally, we describe a real-time implementation using the proposed technique and with a humanoid head embedding four microphones and two cameras: this enables natural human-robot interactive behavior.
Type de document :
Communication dans un congrès
Humanoids 2013 - IEEE-RAS International Conference on Humanoid Robots, Oct 2013, Atlanta, United States. IEEE, pp.203-210, 2013, 〈10.1109/HUMANOIDS.2013.7029977〉
Liste complète des métadonnées

Littérature citée [31 références]  Voir  Masquer  Télécharger


https://hal.inria.fr/hal-00861465
Contributeur : Team Perception <>
Soumis le : jeudi 12 septembre 2013 - 17:38:01
Dernière modification le : mercredi 11 avril 2018 - 01:59:35
Document(s) archivé(s) le : vendredi 13 décembre 2013 - 04:24:32

Fichiers

main_final.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Jan Cech, Ravi Mittal, Antoine Deleforge, Jordi Sanchez-Riera, Xavier Alameda-Pineda, et al.. Active-Speaker Detection and Localization with Microphones and Cameras Embedded into a Robotic Head. Humanoids 2013 - IEEE-RAS International Conference on Humanoid Robots, Oct 2013, Atlanta, United States. IEEE, pp.203-210, 2013, 〈10.1109/HUMANOIDS.2013.7029977〉. 〈hal-00861465〉

Partager

Métriques

Consultations de la notice

1169

Téléchargements de fichiers

1506