Efficient likelihood evaluation and dynamic Gaussian selection for HMM-based speech recognition - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Article Dans Une Revue Computer Speech and Language Année : 2009

Efficient likelihood evaluation and dynamic Gaussian selection for HMM-based speech recognition

Résumé

LVCSR systems are usually based on continuous density HMMs, which are typically implemented using Gaussian mixture distributions. Such statistical modeling systems tend to operate slower than real-time, largely because of the heavy computational overhead of the likelihood evaluation. The objective of our research is to investigate approximate methods that can substantially reduce the computational cost in likelihood evaluation without obviously degrading the recognition accuracy. In this paper, the most common techniques to speed up the likelihood computation are classified into three categories, namely machine optimization, model optimization, and algorithm optimization. Each category is surveyed and summarized by describing and analyzing the basic ideas of the corresponding techniques. The distribution of the numerical values of Gaussian mixtures within a GMM model are evaluated and analyzed to show that computations of some Gaussians are unnecessary and can thus be eliminated. Two commonly used techniques for likelihood approximation, namely VQ-based Gaussian selection and partial distance elimination, are analyzed in detail. Based on the analyses, a fast likelihood computation approach called dynamic Gaussian selection (DGS) is proposed. DGS approach is a one-pass search technique which generates a dynamic shortlist of Gaussians for each state during the procedure of likelihood computation. In principle, DGS is an extension of both techniques of partial distance elimination and best mixture prediction, and it does not require additional memory for the storage of Gaussian shortlists. DGS algorithm has been implemented by modifying the likelihood computation procedure in HTK 3.4 system. Experimental results on TIMIT and WSJ0 corpora indicate that this approach can speed up the likelihood computation significantly without introducing apparent additional recognition error.
Fichier non déposé

Dates et versions

inria-00432533 , version 1 (16-11-2009)

Identifiants

  • HAL Id : inria-00432533 , version 1

Citer

Jun Cai, Ghazi Bouselmi, Yves Laprie, Jean-Paul Haton. Efficient likelihood evaluation and dynamic Gaussian selection for HMM-based speech recognition. Computer Speech and Language, 2009, 23 (2), pp.147-256. ⟨inria-00432533⟩
195 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More