Multi-microphone speech recognition in everyday environments

Abstract : Multi-microphone signal processing techniques have the potential to greatly improve the robustness of speech recognition (ASR) in distant microphone settings. However, in everyday environments, typified by complex non-stationary noise backgrounds, designing effective multi-microphone speech recognition systems is non trivial. In particular, optimal performance requires the tight integration of the front-end signal processing and the back-end statistical speech and noise source modelling. The best way to achieve this in a modern deep learning speech recognition framework remains unclear. Further, variability in microphone array design --- and consequent lack of real training data for any particular configuration --- may mean that systems have to be able to generalise from audio captured using mismatched microphone geometries or produced using simulation.
Type de document :
Article dans une revue
Computer Speech and Language, Elsevier, 2017, 46, pp.386-387. 〈10.1016/j.csl.2017.02.007〉
Liste complète des métadonnées

https://hal.inria.fr/hal-01483469
Contributeur : Emmanuel Vincent <>
Soumis le : dimanche 5 mars 2017 - 23:55:18
Dernière modification le : dimanche 13 mai 2018 - 23:04:02
Document(s) archivé(s) le : mardi 6 juin 2017 - 12:24:21

Fichier

vincent_CSL17.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Jon Barker, Ricard Marxer, Emmanuel Vincent, Shinji Watanabe. Multi-microphone speech recognition in everyday environments. Computer Speech and Language, Elsevier, 2017, 46, pp.386-387. 〈10.1016/j.csl.2017.02.007〉. 〈hal-01483469〉

Partager

Métriques

Consultations de la notice

511

Téléchargements de fichiers

320