Learning Visual Voice Activity Detection with an Automatically Annotated Dataset - Archive ouverte HAL Access content directly
Conference Papers Year :

Learning Visual Voice Activity Detection with an Automatically Annotated Dataset

(1) , (1, 2, 3, 4) , (5, 1) , (6)
1
2
3
4
5
6

Abstract

Visual voice activity detection (V-VAD) uses visual features to predict whether a person is speaking or not. VVAD is useful whenever audio VAD (A-VAD) is inefficient either because the acoustic signal is difficult to analyze or because it is simply missing. We propose two deep architectures for V-VAD, one based on facial landmarks and one based on optical flow. Moreover, available datasets, used for learning and for testing VVAD, lack content variability. We introduce a novel methodology to automatically create and annotate very large datasets inthe-wild – WildVVAD – based on combining A-VAD with face detection and tracking. A thorough empirical evaluation showsthe advantage of training the proposed deep V-VAD models with this dataset.
Fichier principal
Vignette du fichier
GUY_ICPR2020_sub.pdf (6.41 Mo) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

hal-02882229 , version 1 (26-06-2020)
hal-02882229 , version 2 (23-09-2020)
hal-02882229 , version 3 (16-10-2020)
hal-02882229 , version 4 (16-10-2020)

Identifiers

Cite

Sylvain Guy, Stéphane Lathuilière, Pablo Mesejo, Radu Horaud. Learning Visual Voice Activity Detection with an Automatically Annotated Dataset. ICPR 2020 - 25th International Conference on Pattern Recognition, Jan 2021, Milano, Italy. pp.4851-4856, ⟨10.1109/ICPR48806.2021.9412884⟩. ⟨hal-02882229v4⟩
488 View
566 Download

Altmetric

Share

Gmail Facebook Twitter LinkedIn More