Structured GMM Based on Unsupervised Clustering for Recognizing Adult and Child Speech

Arseniy Gorin 1 Denis Jouvet 1
1 PAROLE - Analysis, perception and recognition of speech
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : Speaker variability is a well-known problem of state-of-the art Automatic Speech Recognition (ASR) systems. In particular, handling children speech is challenging because of substantial differences in pronunciation of the speech units between adult and child speakers. To build accurate ASR systems for all types of speakers Hidden Markov Models with Gaussian Mixture Densities were intensively used in combinationwith model adaptation techniques.This paper compares different ways to improve the recognition of children speech and describes a novel approach relying on Class-StructuredGaussian Mixture Model (GMM). A common solution for reducing the speaker variability relies on gender and age adaptation. First, it is proposed to replace gender and age byunsupervised clustering. Speaker classes are first used for adaptation of the conventional HMM. Second, speaker classes are used for initializing structured GMM, where the components of Gaussian densities are structured with respect to the speaker classes. In a first approach mixture weights of the structured GMM are set dependent on the speaker class. In a second approach the mixture weights are replaced by explicit dependencies between Gaussian components of mixture densities (as in stranded GMMs, but here the GMMs are class-structured).The different approaches are evaluated and compared on the TIDIGITS task. The best improvement is achieved when structured GMM is combined with feature adaptation.
Type de document :
Communication dans un congrès
SLSP 2014, 2nd International Conference on Statistical Language and Speech Processing, Oct 2014, Grenoble, France. pp.108 - 119, 2014, 〈10.1007/978-3-319-11397-5_8〉
Liste complète des métadonnées

Littérature citée [17 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01090472
Contributeur : Denis Jouvet <>
Soumis le : mercredi 3 décembre 2014 - 15:43:19
Dernière modification le : jeudi 11 janvier 2018 - 06:25:24
Document(s) archivé(s) le : lundi 9 mars 2015 - 05:50:16

Fichier

ago_slsp_2014_v4-juin2014.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Arseniy Gorin, Denis Jouvet. Structured GMM Based on Unsupervised Clustering for Recognizing Adult and Child Speech. SLSP 2014, 2nd International Conference on Statistical Language and Speech Processing, Oct 2014, Grenoble, France. pp.108 - 119, 2014, 〈10.1007/978-3-319-11397-5_8〉. 〈hal-01090472〉

Partager

Métriques

Consultations de la notice

246

Téléchargements de fichiers

202