Missing data mask estimation with frequency and temporal dependencies

Sébastien Demange; Christophe Cerisara; Jean-Paul Haton

doi:10.1016/j.csl.2008.02.002

Article Dans Une Revue Computer Speech and Language Année : 2009

Missing data mask estimation with frequency and temporal dependencies

(1) , (1) , (1)

Sébastien Demange

Fonction : Auteur
PersonId : 833973

Analysis, perception and recognition of speech

Christophe Cerisara

Fonction : Auteur
PersonId : 2353
IdHAL : christophe-cerisara
IdRef : 102700168

Analysis, perception and recognition of speech

Jean-Paul Haton

Fonction : Auteur
PersonId : 830987

Analysis, perception and recognition of speech

Résumé

Automatic speech recognition (ASR) has reached a very high level of performance in controlled situations. However, the performance degrades drastically when environmental noise occurs during recognition. Nowadays, the major challenge is to reach a good robustness to adverse conditions. Missing data recognition has been developed to deal with this challenge. Unlike other denoising methods, missing data recognition does not match the whole data with the acoustic models, but instead considers part of the signal as missing, i.e. corrupted by noise. The main challenge of this approach is to identify accurately missing parts (also called masks). The work reported here focuses on this issue. We start from developing Bayesian models of the masks, where every spectral feature is classified as reliable or masked, and is assumed independent of the rest of the signal. This classification strategy results in sparse and isolated masked features, like the squares of a chess-board, while oracle reliable and unreliable features tend to be clustered into consistent time–frequency blocks. We then propose to take into account frequency and temporal dependencies in order to improve the masks' estimation accuracy. Integrating such dependencies leads to a new architecture of a missing data mask estimator. The proposed classifier has been evaluated on the noisy Aurora2 (digits recognition) and Aurora4 (continuous speech) databases. Experimental results show a significant improvement of recognition accuracy when these dependencies are considered.

Mots clés

Automatic speech recognition Missing data recognition Missing data masks estimation Frequency and temporal dependencies

Christophe Cerisara : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00338397

Soumis le : jeudi 13 novembre 2008-09:37:49

Dernière modification le : jeudi 15 février 2024-03:32:16

Dates et versions

inria-00338397 , version 1 (13-11-2008)

Identifiants

HAL Id : inria-00338397 , version 1
DOI : 10.1016/j.csl.2008.02.002

Citer

Sébastien Demange, Christophe Cerisara, Jean-Paul Haton. Missing data mask estimation with frequency and temporal dependencies. Computer Speech and Language, 2009, 23 (1), pp.25-41. ⟨10.1016/j.csl.2008.02.002⟩. ⟨inria-00338397⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 CNRS INRIA IRISA UNIV-LORRAINE INRIA2 LORIA UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

114 Consultations

0 Téléchargements

Missing data mask estimation with frequency and temporal dependencies

Résumé

Mots clés

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager