An analysis of environment, microphone and data simulation mismatches in robust speech recognition

Emmanuel Vincent; Shinji Watanabe; Aditya Arie Nugraha; Jon Barker; Ricard Marxer

doi:10.1016/j.csl.2016.11.005

Article Dans Une Revue Computer Speech and Language Année : 2017

An analysis of environment, microphone and data simulation mismatches in robust speech recognition

(1) , (2) , (1) , (3) , (3)

1
2
3

Emmanuel Vincent

Fonction : Auteur
PersonId : 1256
IdHAL : emmanuelv
ORCID : 0000-0002-0183-7289
IdRef : 089360176

Speech Modeling for Facilitating Oral-Based Communication

Shinji Watanabe

Fonction : Auteur

Mitsubishi Electric Research Laboratories

Aditya Arie Nugraha

Fonction : Auteur
PersonId : 967049

Speech Modeling for Facilitating Oral-Based Communication

Jon Barker

Fonction : Auteur

University of Sheffield [Sheffield]

Ricard Marxer

Fonction : Auteur
PersonId : 19391
IdHAL : ricard-marxer
ORCID : 0000-0001-5099-5059
IdRef : 240437713

University of Sheffield [Sheffield]

Résumé

Speech enhancement and automatic speech recognition (ASR) are most often evaluated in matched (or multi-condition) settings where the acoustic conditions of the training data match (or cover) those of the test data. Few studies have systematically assessed the impact of acoustic mismatches between training and test data, especially concerning recent speech enhancement and state-of-the-art ASR techniques. In this article, we study this issue in the context of the CHiME-3 dataset, which consists of sentences spoken by talkers situated in challenging noisy environments recorded using a 6-channel tablet based microphone array. We provide a critical analysis of the results published on this dataset for various signal enhancement, feature extraction, and ASR backend techniques and perform a number of new experiments in order to separately assess the impact of different noise environments, different numbers and positions of microphones, or simulated vs. real data on speech enhancement and ASR performance. We show that, with the exception of minimum variance distortionless response (MVDR) beamforming, most algorithms perform consistently on real and simulated data and can benefit from training on simulated data. We also find that training on different noise environments and different microphones barely affects the ASR performance, especially when several environments are present in the training data: only the number of microphones has a significant impact. Based on these results, we introduce the CHiME-4 Speech Separation and Recognition Challenge , which revisits the CHiME-3 dataset and makes it more challenging by reducing the number of microphones available for testing.

Mots clés

speech enhancement microphone array Robust ASR train/test mismatch

Domaines

Traitement du signal et de l'image [eess.SP]

Fichier principal

vincent_CSL16.pdf (715.96 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Emmanuel Vincent : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01399180

Soumis le : vendredi 18 novembre 2016-14:05:48

Dernière modification le : jeudi 1 février 2024-10:06:10

Archivage à long terme le : lundi 20 mars 2017-16:46:06

Dates et versions

hal-01399180 , version 1 (18-11-2016)

Identifiants

HAL Id : hal-01399180 , version 1
DOI : 10.1016/j.csl.2016.11.005

Citer

Emmanuel Vincent, Shinji Watanabe, Aditya Arie Nugraha, Jon Barker, Ricard Marxer. An analysis of environment, microphone and data simulation mismatches in robust speech recognition. Computer Speech and Language, 2017, 46, pp.535-557. ⟨10.1016/j.csl.2016.11.005⟩. ⟨hal-01399180⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 CNRS INRIA IRISA GRID5000 UNIV-LORRAINE INRIA2 LORIA LORIA-NLPKD UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES SILECS UR1-MATH-NUM

764 Consultations

2760 Téléchargements

An analysis of environment, microphone and data simulation mismatches in robust speech recognition

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager