Skip to Main content Skip to Navigation
New interface
Conference papers

Audio-Visual Speech-Turn Detection and Tracking

Israel Dejene Gebru 1 Silèye Ba 1 Georgios Evangelidis 1 Radu Horaud 1 
1 PERCEPTION - Interpretation and Modelling of Images and Videos
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann, Grenoble INP - Institut polytechnique de Grenoble - Grenoble Institute of Technology
Abstract : Speaker diarization is an important component of multi-party dialog systems in order to assign speech-signal segments among participants. Diariza-tion may well be viewed as the problem of detecting and tracking speech turns. It is proposed to address this problem by modeling the spatial coincidence of visual and auditory observations and by combining this coincidence model with a dynamic Bayesian formulation that tracks the identity of the active speaker. Speech-turn tracking is formulated as a latent-variable temporal graphical model and an exact inference algorithm is proposed. We describe in detail an audiovisual discriminative observation model as well as a state-transition model. We also describe an implementation of a full system composed of multi-person visual tracking, sound-source localization and the proposed online diarization technique. Finally we show that the proposed method yields promising results with two challenging scenarios that were carefully recorded and annotated.
Complete list of metadata

Cited literature [9 references]  Display  Hide  Download
Contributor : Perception team Connect in order to contact the contributor
Submitted on : Monday, June 15, 2015 - 11:39:10 AM
Last modification on : Thursday, May 5, 2022 - 3:11:27 AM
Long-term archiving on: : Tuesday, April 25, 2017 - 7:58:18 AM


Files produced by the author(s)




Israel Dejene Gebru, Silèye Ba, Georgios Evangelidis, Radu Horaud. Audio-Visual Speech-Turn Detection and Tracking. 12th International Conference on Latent Variable Analysis and Signal Separation, LVA/ICA 2015, Aug 2015, Liberec, Czech Republic. pp.143-151, ⟨10.1007/978-3-319-22482-4_17⟩. ⟨hal-01163659⟩



Record views


Files downloads