Skip to Main content Skip to Navigation
New interface

Acoustic Model Structuring for Improving Automatic Speech Recognition Performance

Arseniy Gorin 1 
1 PAROLE - Analysis, perception and recognition of speech
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : This thesis focuses on acoustic model structuring for improving HMM-based automatic speech recognition. The structuring relies on unsupervised clustering of speech utterances of the training data in order to handle speaker and channel variability. The idea is to split the data into acoustically similar classes. In conventional multi-modeling (or class-based) approach, separate class-dependent models are built via adaptation of a speaker-independent model. When the number of classes increases, less data becomes available for the estimation of the class-based models, and the parameters are less reliable. One way to handle such problem is to modify the classification criterion applied on the training data, allowing a given utterance to belong to more than one class. This is obtained by relaxing the classification decision through a soft margin. This is investigated in the first part of the thesis. In the main part of the thesis, a novel approach is proposed that uses the clustered data more efficiently in a class-structured GMM. Instead of adapting all HMM-GMM parameters separately for each class of data, the class information is explicitly introduced into the GMM structure by associating a given density component with a given class. To efficiently exploit such structured HMM-GMM, two different approaches are proposed. The first approach combines class-structured GMM with class-dependent mixture weights. In this model the Gaussian components are shared across speaker classes, but they are class-structured, and the mixture weights are class-dependent. For decoding an utterance, the set of mixture weights is selected according to the estimated class. In the second approach, the mixture weights are replaced by density component transition probabilities. The approaches proposed in the thesis are analyzed and evaluated on various speech data, which cover different types of variability sources (age, gender, accent and noise).
Complete list of metadata

Cited literature [107 references]  Display  Hide  Download
Contributor : Arseniy Gorin Connect in order to contact the contributor
Submitted on : Monday, January 12, 2015 - 3:31:49 PM
Last modification on : Saturday, June 25, 2022 - 7:40:49 PM
Long-term archiving on: : Monday, April 13, 2015 - 10:25:31 AM


  • HAL Id : tel-01751053, version 2


Arseniy Gorin. Acoustic Model Structuring for Improving Automatic Speech Recognition Performance. Sound [cs.SD]. Université de Lorraine, 2014. English. ⟨NNT : 2014LORR0161⟩. ⟨tel-01751053v2⟩



Record views


Files downloads