Missing data mask estimation with frequency and temporal dependencies

Sébastien Demange 1 Christophe Cerisara 1 Jean-Paul Haton 1
1 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : Automatic speech recognition (ASR) has reached a very high level of performance in controlled situations. However, the performance degrades drastically when environmental noise occurs during recognition. Nowadays, the major challenge is to reach a good robustness to adverse conditions. Missing data recognition has been developed to deal with this challenge. Unlike other denoising methods, missing data recognition does not match the whole data with the acoustic models, but instead considers part of the signal as missing, i.e. corrupted by noise. The main challenge of this approach is to identify accurately missing parts (also called masks). The work reported here focuses on this issue. We start from developing Bayesian models of the masks, where every spectral feature is classified as reliable or masked, and is assumed independent of the rest of the signal. This classification strategy results in sparse and isolated masked features, like the squares of a chess-board, while oracle reliable and unreliable features tend to be clustered into consistent time–frequency blocks. We then propose to take into account frequency and temporal dependencies in order to improve the masks' estimation accuracy. Integrating such dependencies leads to a new architecture of a missing data mask estimator. The proposed classifier has been evaluated on the noisy Aurora2 (digits recognition) and Aurora4 (continuous speech) databases. Experimental results show a significant improvement of recognition accuracy when these dependencies are considered.
Type de document :
Article dans une revue
Computer Speech and Language, Elsevier, 2009, 23 (1), pp.25-41. 〈10.1016/j.csl.2008.02.002〉
Liste complète des métadonnées

Contributeur : Christophe Cerisara <>
Soumis le : jeudi 13 novembre 2008 - 09:37:49
Dernière modification le : jeudi 11 janvier 2018 - 06:19:56




Sébastien Demange, Christophe Cerisara, Jean-Paul Haton. Missing data mask estimation with frequency and temporal dependencies. Computer Speech and Language, Elsevier, 2009, 23 (1), pp.25-41. 〈10.1016/j.csl.2008.02.002〉. 〈inria-00338397〉



Consultations de la notice