Skip to Main content Skip to Navigation

Uncertainty learning for noise robust ASR

Dung Tien Tran 1
1 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : This thesis focuses on noise robust automatic speech recognition (ASR). It includes two parts. First, we focus on better handling of uncertainty to improve the performance of ASR in a noisy environment. Second, we present a method to accelerate the training process of a neural network using an auxiliary function technique. In the first part, multichannel speech enhancement is applied to input noisy speech. The posterior distribution of the underlying clean speech is then estimated, as represented by its mean and its covariance matrix or uncertainty. We show how to propagate the diagonal uncertainty covariance matrix in the spectral domain through the feature computation stage to obtain the full uncertainty covariance matrix in the feature domain. Uncertainty decoding exploits this posterior distribution to dynamically modify the acoustic model parameters in the decoding rule. The uncertainty decoding rule simply consists of adding the uncertainty covariance matrix of the enhanced features to the variance of each Gaussian component. We then propose two uncertainty estimators based on fusion to nonparametric estimation, respectively. To build a new estimator, we consider a linear combination of existing uncertainty estimators or kernel functions. The combination weights are generatively estimated by mini- mizing some divergence with respect to the oracle uncertainty. The divergence measures used are weighted versions of Kullback-Leibler (KL), Itakura-Saito (IS), and Euclidean (EU) diver- gences. Due to the inherent nonnegativity of uncertainty, this estimation problem can be seen as an instance of weighted nonnegative matrix factorization (NMF). In addition, we propose two discriminative uncertainty estimators based on linear or nonlin- ear mapping of the generatively estimated uncertainty. This mapping is trained so as to maxi- mize the boosted maximum mutual information (bMMI) criterion. We compute the derivative of this criterion using the chain rule and optimize it using stochastic gradient descent. In the second part, we introduce a new learning rule for neural networks that is based on an auxiliary function technique without parameter tuning. Instead of minimizing the objective function, this technique consists of minimizing a quadratic auxiliary function which is recursively introduced layer by layer and which has a closed-form optimum. Based on the properties of this auxiliary function, the monotonic decrease of the new learning rule is guaranteed.
Document type :
Complete list of metadata

Cited literature [116 references]  Display  Hide  Download
Contributor : Dung Tran Connect in order to contact the contributor
Submitted on : Friday, December 18, 2015 - 3:49:58 PM
Last modification on : Saturday, October 16, 2021 - 11:26:09 AM
Long-term archiving on: : Saturday, March 19, 2016 - 2:20:50 PM


  • HAL Id : tel-01754497, version 2


Dung Tien Tran. Uncertainty learning for noise robust ASR. Sound [cs.SD]. Université de Lorraine, 2015. English. ⟨NNT : 2015LORR0236⟩. ⟨tel-01754497v2⟩



Les métriques sont temporairement indisponibles