Skip to Main content Skip to Navigation
Conference papers

Using full-rank spatial covariance models for noise-robust ASR

Dung Tran 1 Emmanuel Vincent 1 Denis Jouvet 1 Kamil Adiloglu 2
1 PAROLE - Analysis, perception and recognition of speech
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : We present a joint spatial and spectral denoising front-end for Track 1 of the 2nd CHiME Speech Separation and Recognition Challenge based on the Flexible Audio Source Separation Toolbox (FASST). We represent the sources by nonnegative matrix factorization (NMF) and full-rank spatial covariances, which are known to be appropriate for the modeling of small source movements. We then learn acoustic models for automatic speech recognition (ASR) on the enhanced training data. We obtain 40% average error rate reduction due to speech separation compared to multicondition training alone.
Complete list of metadata

Cited literature [5 references]  Display  Hide  Download
Contributor : Emmanuel Vincent Connect in order to contact the contributor
Submitted on : Friday, March 15, 2013 - 10:59:07 AM
Last modification on : Saturday, October 16, 2021 - 11:26:08 AM
Long-term archiving on: : Monday, June 17, 2013 - 2:22:18 PM


Files produced by the author(s)


  • HAL Id : hal-00801162, version 1


Dung Tran, Emmanuel Vincent, Denis Jouvet, Kamil Adiloglu. Using full-rank spatial covariance models for noise-robust ASR. CHiME - 2nd International Workshop on Machine Listening in Multisource Environments - 2013, Jun 2013, Vancouver, Canada. pp.31-32. ⟨hal-00801162⟩



Les métriques sont temporairement indisponibles