Structured GMM Based on Unsupervised Clustering for Recognizing Adult and Child Speech

Arseniy Gorin; Denis Jouvet

doi:10.1007/978-3-319-11397-5_8

Communication Dans Un Congrès Année : 2014

Structured GMM Based on Unsupervised Clustering for Recognizing Adult and Child Speech

(1) , (1)

Arseniy Gorin

Fonction : Auteur
PersonId : 957227

Analysis, perception and recognition of speech

Denis Jouvet

Fonction : Auteur
PersonId : 15904
IdHAL : denis-jouvet
IdRef : 029418666

Analysis, perception and recognition of speech

Résumé

Speaker variability is a well-known problem of state-of-the art Automatic Speech Recognition (ASR) systems. In particular, handling children speech is challenging because of substantial differences in pronunciation of the speech units between adult and child speakers. To build accurate ASR systems for all types of speakers Hidden Markov Models with Gaussian Mixture Densities were intensively used in combinationwith model adaptation techniques.This paper compares different ways to improve the recognition of children speech and describes a novel approach relying on Class-StructuredGaussian Mixture Model (GMM). A common solution for reducing the speaker variability relies on gender and age adaptation. First, it is proposed to replace gender and age byunsupervised clustering. Speaker classes are first used for adaptation of the conventional HMM. Second, speaker classes are used for initializing structured GMM, where the components of Gaussian densities are structured with respect to the speaker classes. In a first approach mixture weights of the structured GMM are set dependent on the speaker class. In a second approach the mixture weights are replaced by explicit dependencies between Gaussian components of mixture densities (as in stranded GMMs, but here the GMMs are class-structured).The different approaches are evaluated and compared on the TIDIGITS task. The best improvement is achieved when structured GMM is combined with feature adaptation.

Mots clés

speech recognition unsupervised clustering speaker class modeling stochastic trajectory modeling

Domaines

Traitement du signal et de l'image [eess.SP]

Fichier principal

ago_slsp_2014_v4-juin2014.pdf (280.38 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Denis Jouvet : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01090472

Soumis le : mercredi 3 décembre 2014-15:43:19

Dernière modification le : lundi 11 septembre 2023-17:41:19

Archivage à long terme le : lundi 9 mars 2015-05:50:16

Dates et versions

hal-01090472 , version 1 (03-12-2014)

Identifiants

HAL Id : hal-01090472 , version 1
DOI : 10.1007/978-3-319-11397-5_8

Citer

Arseniy Gorin, Denis Jouvet. Structured GMM Based on Unsupervised Clustering for Recognizing Adult and Child Speech. SLSP 2014, 2nd International Conference on Statistical Language and Speech Processing, Oct 2014, Grenoble, France. pp.108 - 119, ⟨10.1007/978-3-319-11397-5_8⟩. ⟨hal-01090472⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UNIV-LORRAINE INRIA2 LORIA LORIA-NLPKD

187 Consultations

311 Téléchargements

Structured GMM Based on Unsupervised Clustering for Recognizing Adult and Child Speech

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager