Voice Activity Detection Based on Statistical Likelihood Ratio With Adaptive Thresholding

Xiaofei Li 1 Radu Horaud 1 Laurent Girin 1, 2 Sharon Gannot 3
1 PERCEPTION - Interpretation and Modelling of Images and Videos
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann, INPG - Institut National Polytechnique de Grenoble
2 GIPSA-CRISSP - CRISSP
GIPSA-DPC - Département Parole et Cognition
Abstract : Statistical likelihood ratio test is a widely used voice activity detection (VAD) method, in which the likelihood ratio of the current temporal frame is compared with a threshold. A fixed threshold is always used, but this is not suitable for various types of noise. In this paper, an adaptive threshold is proposed as a function of the local statistics of the likelihood ratio. This threshold represents the upper bound of the likelihood ratio for the non-speech frames, whereas it remains generally lower than the likelihood ratio for the speech frames. As a result, a high non-speech hit rate can be achieved, while maintaining speech hit rate as large as possible.
Complete list of metadatas

Cited literature [16 references]  Display  Hide  Download

https://hal.inria.fr/hal-01349776
Contributor : Team Perception <>
Submitted on : Thursday, July 28, 2016 - 4:36:05 PM
Last modification on : Wednesday, April 11, 2018 - 1:59:13 AM
Long-term archiving on : Saturday, October 29, 2016 - 10:40:28 AM

File

vad_slr.pdf
Files produced by the author(s)

Identifiers

Citation

Xiaofei Li, Radu Horaud, Laurent Girin, Sharon Gannot. Voice Activity Detection Based on Statistical Likelihood Ratio With Adaptive Thresholding. International Workshop on Acoustic Signal Enhancement (IWAENC), Sep 2016, Xi'an, China. pp.1-5, ⟨10.1109/IWAENC.2016.7602911⟩. ⟨hal-01349776⟩

Share

Metrics

Record views

676

Files downloads

873