Using full-rank spatial covariance models for noise-robust ASR

Dung Tran 1 Emmanuel Vincent 1 Denis Jouvet 1 Kamil Adiloglu 2
1 PAROLE - Analysis, perception and recognition of speech
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : We present a joint spatial and spectral denoising front-end for Track 1 of the 2nd CHiME Speech Separation and Recognition Challenge based on the Flexible Audio Source Separation Toolbox (FASST). We represent the sources by nonnegative matrix factorization (NMF) and full-rank spatial covariances, which are known to be appropriate for the modeling of small source movements. We then learn acoustic models for automatic speech recognition (ASR) on the enhanced training data. We obtain 40% average error rate reduction due to speech separation compared to multicondition training alone.
Complete list of metadatas

Cited literature [5 references]  Display  Hide  Download

https://hal.inria.fr/hal-00801162
Contributor : Emmanuel Vincent <>
Submitted on : Friday, March 15, 2013 - 10:59:07 AM
Last modification on : Thursday, June 6, 2019 - 3:40:03 PM
Long-term archiving on : Monday, June 17, 2013 - 2:22:18 PM

File

tran_CHiME13.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00801162, version 1

Citation

Dung Tran, Emmanuel Vincent, Denis Jouvet, Kamil Adiloglu. Using full-rank spatial covariance models for noise-robust ASR. CHiME - 2nd International Workshop on Machine Listening in Multisource Environments - 2013, Jun 2013, Vancouver, Canada. pp.31-32. ⟨hal-00801162⟩

Share

Metrics

Record views

618

Files downloads

417