Using full-rank spatial covariance models for noise-robust ASR

Dung Tran 1 Emmanuel Vincent 1 Denis Jouvet 1 Kamil Adiloglu 2
1 PAROLE - Analysis, perception and recognition of speech
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : We present a joint spatial and spectral denoising front-end for Track 1 of the 2nd CHiME Speech Separation and Recognition Challenge based on the Flexible Audio Source Separation Toolbox (FASST). We represent the sources by nonnegative matrix factorization (NMF) and full-rank spatial covariances, which are known to be appropriate for the modeling of small source movements. We then learn acoustic models for automatic speech recognition (ASR) on the enhanced training data. We obtain 40% average error rate reduction due to speech separation compared to multicondition training alone.
Type de document :
Communication dans un congrès
CHiME - 2nd International Workshop on Machine Listening in Multisource Environments - 2013, Jun 2013, Vancouver, Canada. pp.31-32, 2013
Liste complète des métadonnées

Littérature citée [5 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-00801162
Contributeur : Emmanuel Vincent <>
Soumis le : vendredi 15 mars 2013 - 10:59:07
Dernière modification le : jeudi 11 janvier 2018 - 06:25:24
Document(s) archivé(s) le : lundi 17 juin 2013 - 14:22:18

Fichier

tran_CHiME13.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00801162, version 1

Citation

Dung Tran, Emmanuel Vincent, Denis Jouvet, Kamil Adiloglu. Using full-rank spatial covariance models for noise-robust ASR. CHiME - 2nd International Workshop on Machine Listening in Multisource Environments - 2013, Jun 2013, Vancouver, Canada. pp.31-32, 2013. 〈hal-00801162〉

Partager

Métriques

Consultations de la notice

520

Téléchargements de fichiers

399