Skip to Main content Skip to Navigation
Journal articles

Efficient likelihood evaluation and dynamic Gaussian selection for HMM-based speech recognition

Jun Cai 1 Ghazi Bouselmi 1 Yves Laprie 1 Jean-Paul Haton 1
1 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : LVCSR systems are usually based on continuous density HMMs, which are typically implemented using Gaussian mixture distributions. Such statistical modeling systems tend to operate slower than real-time, largely because of the heavy computational overhead of the likelihood evaluation. The objective of our research is to investigate approximate methods that can substantially reduce the computational cost in likelihood evaluation without obviously degrading the recognition accuracy. In this paper, the most common techniques to speed up the likelihood computation are classified into three categories, namely machine optimization, model optimization, and algorithm optimization. Each category is surveyed and summarized by describing and analyzing the basic ideas of the corresponding techniques. The distribution of the numerical values of Gaussian mixtures within a GMM model are evaluated and analyzed to show that computations of some Gaussians are unnecessary and can thus be eliminated. Two commonly used techniques for likelihood approximation, namely VQ-based Gaussian selection and partial distance elimination, are analyzed in detail. Based on the analyses, a fast likelihood computation approach called dynamic Gaussian selection (DGS) is proposed. DGS approach is a one-pass search technique which generates a dynamic shortlist of Gaussians for each state during the procedure of likelihood computation. In principle, DGS is an extension of both techniques of partial distance elimination and best mixture prediction, and it does not require additional memory for the storage of Gaussian shortlists. DGS algorithm has been implemented by modifying the likelihood computation procedure in HTK 3.4 system. Experimental results on TIMIT and WSJ0 corpora indicate that this approach can speed up the likelihood computation significantly without introducing apparent additional recognition error.
Complete list of metadata
Contributor : Yves Laprie Connect in order to contact the contributor
Submitted on : Monday, November 16, 2009 - 3:50:50 PM
Last modification on : Friday, February 26, 2021 - 3:28:05 PM


  • HAL Id : inria-00432533, version 1



Jun Cai, Ghazi Bouselmi, Yves Laprie, Jean-Paul Haton. Efficient likelihood evaluation and dynamic Gaussian selection for HMM-based speech recognition. Computer Speech and Language, Elsevier, 2009, 23 (2), pp.147-256. ⟨inria-00432533⟩



Record views