Structured GMM Based on Unsupervised Clustering for Recognizing Adult and Child Speech

Arseniy Gorin 1 Denis Jouvet 1
1 PAROLE - Analysis, perception and recognition of speech
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : Speaker variability is a well-known problem of state-of-the art Automatic Speech Recognition (ASR) systems. In particular, handling children speech is challenging because of substantial differences in pronunciation of the speech units between adult and child speakers. To build accurate ASR systems for all types of speakers Hidden Markov Models with Gaussian Mixture Densities were intensively used in combinationwith model adaptation techniques.This paper compares different ways to improve the recognition of children speech and describes a novel approach relying on Class-StructuredGaussian Mixture Model (GMM). A common solution for reducing the speaker variability relies on gender and age adaptation. First, it is proposed to replace gender and age byunsupervised clustering. Speaker classes are first used for adaptation of the conventional HMM. Second, speaker classes are used for initializing structured GMM, where the components of Gaussian densities are structured with respect to the speaker classes. In a first approach mixture weights of the structured GMM are set dependent on the speaker class. In a second approach the mixture weights are replaced by explicit dependencies between Gaussian components of mixture densities (as in stranded GMMs, but here the GMMs are class-structured).The different approaches are evaluated and compared on the TIDIGITS task. The best improvement is achieved when structured GMM is combined with feature adaptation.
Document type :
Conference papers
Liste complète des métadonnées

Cited literature [17 references]  Display  Hide  Download
Contributor : Denis Jouvet <>
Submitted on : Wednesday, December 3, 2014 - 3:43:19 PM
Last modification on : Tuesday, December 18, 2018 - 4:38:02 PM
Document(s) archivé(s) le : Monday, March 9, 2015 - 5:50:16 AM


Files produced by the author(s)




Arseniy Gorin, Denis Jouvet. Structured GMM Based on Unsupervised Clustering for Recognizing Adult and Child Speech. SLSP 2014, 2nd International Conference on Statistical Language and Speech Processing, Oct 2014, Grenoble, France. pp.108 - 119, ⟨10.1007/978-3-319-11397-5_8⟩. ⟨hal-01090472⟩



Record views


Files downloads