Component Structuring and Trajectory Modeling for Speech Recognition

Arseniy Gorin; Denis Jouvet

Communication Dans Un Congrès Année : 2014

Component Structuring and Trajectory Modeling for Speech Recognition

(1) , (1)

Arseniy Gorin

Fonction : Auteur
PersonId : 957227

Analysis, perception and recognition of speech

Denis Jouvet

Fonction : Auteur
PersonId : 15904
IdHAL : denis-jouvet
IdRef : 029418666

Analysis, perception and recognition of speech

Résumé

When the speech data are produced by speakers of different age and gender, the acoustic variability of any given phonetic unit becomes large, which degrades speech recognition performance. A way to go beyond the conventional Hidden Markov Model is to explicitly include speaker class information in the modeling. Speaker classes can be obtained by unsupervised clustering of the speech utterances. This paper introduces a structuring of the Gaussian compo- nents of the GMM densities with respect to speaker classes. In a first approach, the structuring of the Gaussian components is combined with speaker class-dependent mixture weights. In a second approach, the structuring is used with mixture transition matrices, which add dependencies between Gaussian components of mixture densities (as in stranded GMMs). The different approaches are evaluated and compared in detail on the TIDIGITS task. Significant improvements are obtained using the proposed approaches based on structured components. Additional results are reported for phonetic decoding on the NEOLOGOS database, a large corpus of French telephone data.

Domaines

Son [cs.SD] Traitement du signal et de l'image [eess.SP] Traitement du signal et de l'image [eess.SP]

Fichier principal

inter2014_agorin_v11.pdf (206.82 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Arseniy Gorin : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01063653

Soumis le : vendredi 12 septembre 2014-16:03:11

Dernière modification le : lundi 11 septembre 2023-17:41:19

Archivage à long terme le : samedi 13 décembre 2014-10:46:34

Dates et versions

hal-01063653 , version 1 (12-09-2014)

Identifiants

HAL Id : hal-01063653 , version 1

Citer

Arseniy Gorin, Denis Jouvet. Component Structuring and Trajectory Modeling for Speech Recognition. Interspeech, Sep 2014, Singapoore, Singapore. ⟨hal-01063653⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UNIV-LORRAINE INRIA2 LORIA LORIA-NLPKD

385 Consultations

167 Téléchargements

Component Structuring and Trajectory Modeling for Speech Recognition

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager