Modeling inter-speaker variability in speech recognition

Abstract : This paper details a method for taking into account variability influence in HMM-based speech recognition. The set of Gaussian components of the mixtures represents the entire acoustic space covered for all possible variability values. For each utterance to be recognized, the corresponding variability value is estimated and used to weight and/or constrain dynamically the acoustic space for each pdf. To do that, the weight coefficients of the Gaussian mixtures are set dependent on the variability value. As an example, the variability considered is the inter-speaker variability, and is handled through speaker classes. Taking into account for each utterance the four speaker classes that best match with the utterance signal leads to a significant word error rate reduction on a continuous speech recognition task, as compared to standard speaker-independent modeling.
Type de document :
Communication dans un congrès
ICASSP-2008 (IEEE International Conference on Acoustics, Speech and Signal Processing, 2008), Mar 2008, las Vegas, United States. 2008, 〈10.1109/ICASSP.2008.4518663〉
Liste complète des métadonnées

https://hal.inria.fr/inria-00616509
Contributeur : Denis Jouvet <>
Soumis le : lundi 22 août 2011 - 18:02:22
Dernière modification le : mercredi 8 novembre 2017 - 18:46:02

Identifiants

Citation

Gwenael Cloarec, Denis Jouvet. Modeling inter-speaker variability in speech recognition. ICASSP-2008 (IEEE International Conference on Acoustics, Speech and Signal Processing, 2008), Mar 2008, las Vegas, United States. 2008, 〈10.1109/ICASSP.2008.4518663〉. 〈inria-00616509〉

Partager

Métriques

Consultations de la notice

49