Missing data mask estimation with frequency and temporal dependencies - Archive ouverte HAL Access content directly
Journal Articles Computer Speech and Language Year : 2009

Missing data mask estimation with frequency and temporal dependencies

(1) , (1) , (1)
1

Abstract

Automatic speech recognition (ASR) has reached a very high level of performance in controlled situations. However, the performance degrades drastically when environmental noise occurs during recognition. Nowadays, the major challenge is to reach a good robustness to adverse conditions. Missing data recognition has been developed to deal with this challenge. Unlike other denoising methods, missing data recognition does not match the whole data with the acoustic models, but instead considers part of the signal as missing, i.e. corrupted by noise. The main challenge of this approach is to identify accurately missing parts (also called masks). The work reported here focuses on this issue. We start from developing Bayesian models of the masks, where every spectral feature is classified as reliable or masked, and is assumed independent of the rest of the signal. This classification strategy results in sparse and isolated masked features, like the squares of a chess-board, while oracle reliable and unreliable features tend to be clustered into consistent time–frequency blocks. We then propose to take into account frequency and temporal dependencies in order to improve the masks' estimation accuracy. Integrating such dependencies leads to a new architecture of a missing data mask estimator. The proposed classifier has been evaluated on the noisy Aurora2 (digits recognition) and Aurora4 (continuous speech) databases. Experimental results show a significant improvement of recognition accuracy when these dependencies are considered.
Not file

Dates and versions

inria-00338397 , version 1 (13-11-2008)

Identifiers

Cite

Sébastien Demange, Christophe Cerisara, Jean-Paul Haton. Missing data mask estimation with frequency and temporal dependencies. Computer Speech and Language, 2009, 23 (1), pp.25-41. ⟨10.1016/j.csl.2008.02.002⟩. ⟨inria-00338397⟩
113 View
0 Download

Altmetric

Share

Gmail Facebook Twitter LinkedIn More