3532 articles – 5253 Notices  [english version]

inria-00338397, version 1

Missing data mask estimation with frequency and temporal dependencies

Sébastien Demange () a1, Christophe Cerisara () b1, Jean-Paul Haton () 1

Computer Speech & Language / Computer Speech and Language 23, 1 (2009) 25-41

Résumé : Automatic speech recognition (ASR) has reached a very high level of performance in controlled situations. However, the performance degrades drastically when environmental noise occurs during recognition. Nowadays, the major challenge is to reach a good robustness to adverse conditions. Missing data recognition has been developed to deal with this challenge. Unlike other denoising methods, missing data recognition does not match the whole data with the acoustic models, but instead considers part of the signal as missing, i.e. corrupted by noise. The main challenge of this approach is to identify accurately missing parts (also called masks). The work reported here focuses on this issue. We start from developing Bayesian models of the masks, where every spectral feature is classified as reliable or masked, and is assumed independent of the rest of the signal. This classification strategy results in sparse and isolated masked features, like the squares of a chess-board, while oracle reliable and unreliable features tend to be clustered into consistent time–frequency blocks. We then propose to take into account frequency and temporal dependencies in order to improve the masks' estimation accuracy. Integrating such dependencies leads to a new architecture of a missing data mask estimator. The proposed classifier has been evaluated on the noisy Aurora2 (digits recognition) and Aurora4 (continuous speech) databases. Experimental results show a significant improvement of recognition accuracy when these dependencies are considered.

  • a –  INRIA
  • b –  CNRS
  • 1 :  PAROLE (INRIA Lorraine - LORIA)
  • INRIA – CNRS : UMR7503 – Université Henri Poincaré - Nancy I – Université Nancy II – Institut National Polytechnique de Lorraine (INPL)
  • Mots-clés : Automatic speech recognition – Missing data recognition – Missing data masks estimation – Frequency and temporal dependencies
 
  • inria-00338397, version 1
  • oai:hal.inria.fr:inria-00338397
  • Contributeur : 
  • Soumis le : Jeudi 13 Novembre 2008, 09:37:49
  • Dernière modification le : Lundi 17 Novembre 2008, 09:45:28