Automatic Speech Recognition for Non-Native Speakers

Abstract : Automatic speech recognition technology has achieved maturity, where it has been widely integrated into many systems. However, speech recognition system for non-native speakers still suffers from high error rate, which is due to the mismatch between the non-native speech and the trained models. Recording sufficient non-native speech for training is time consuming and often difficult.
In this thesis, we propose approaches to adapt acoustic and pronunciation model under different resource constraints for non-native speakers. A preliminary work on accent identification has also been carried out.
Multilingual acoustic modeling has been proposed for modeling cross-lingual transfer of non-native speakers to overcome the difficulty in obtaining non-native speech. In cases where multilingual acoustic models are available, a hybrid approach of acoustic interpolation and merging has been proposed for adapting the target acoustic model. The proposed approach has also proven to be useful for context modeling. However, if multilingual corpora are available instead, a class of three interpolation methods has equally been introduced for adaptation. Two of them are supervised speaker adaptation methods, which can be carried out with only few non-native utterances.
In term of pronunciation modeling, two existing approaches which model pronunciation variants, one at the pronunciation dictionary and another at the rescoring module have been revisited, so that they can work under limited amount of non-native speech. We have also proposed a speaker clustering approach called “latent pronunciation analysis” for clustering non-native speakers based on pronunciation habits. This approach can also be used for pronunciation adaptation.
Finally, a text dependent accent identification method has been proposed. The approach can work with little amount of non-native speech for creating robust accent models. This is made possible with the generalizability of the decision trees and the usage of multilingual resources to increase the performance of the accent models.
Document type :
Theses
Liste complète des métadonnées

Cited literature [87 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00294973
Contributor : Tan Tien Ping <>
Submitted on : Thursday, July 10, 2008 - 6:57:59 PM
Last modification on : Thursday, October 11, 2018 - 8:48:01 AM
Document(s) archivé(s) le : Friday, May 28, 2010 - 9:38:31 PM

Identifiers

  • HAL Id : tel-00294973, version 1

Collections

Citation

Tan Tien Ping. Automatic Speech Recognition for Non-Native Speakers. Other [cs.OH]. Université Joseph-Fourier - Grenoble I, 2008. English. ⟨tel-00294973⟩

Share

Metrics

Record views

360

Files downloads

1078