Efficient likelihood evaluation and dynamic Gaussian selection for HMM-based speech recognition - Inria - Institut national de recherche en sciences et technologies du numérique Access content directly
Journal Articles Computer Speech and Language Year : 2009

Efficient likelihood evaluation and dynamic Gaussian selection for HMM-based speech recognition

Abstract

LVCSR systems are usually based on continuous density HMMs, which are typically implemented using Gaussian mixture distributions. Such statistical modeling systems tend to operate slower than real-time, largely because of the heavy computational overhead of the likelihood evaluation. The objective of our research is to investigate approximate methods that can substantially reduce the computational cost in likelihood evaluation without obviously degrading the recognition accuracy. In this paper, the most common techniques to speed up the likelihood computation are classified into three categories, namely machine optimization, model optimization, and algorithm optimization. Each category is surveyed and summarized by describing and analyzing the basic ideas of the corresponding techniques. The distribution of the numerical values of Gaussian mixtures within a GMM model are evaluated and analyzed to show that computations of some Gaussians are unnecessary and can thus be eliminated. Two commonly used techniques for likelihood approximation, namely VQ-based Gaussian selection and partial distance elimination, are analyzed in detail. Based on the analyses, a fast likelihood computation approach called dynamic Gaussian selection (DGS) is proposed. DGS approach is a one-pass search technique which generates a dynamic shortlist of Gaussians for each state during the procedure of likelihood computation. In principle, DGS is an extension of both techniques of partial distance elimination and best mixture prediction, and it does not require additional memory for the storage of Gaussian shortlists. DGS algorithm has been implemented by modifying the likelihood computation procedure in HTK 3.4 system. Experimental results on TIMIT and WSJ0 corpora indicate that this approach can speed up the likelihood computation significantly without introducing apparent additional recognition error.
No file

Dates and versions

inria-00432533 , version 1 (16-11-2009)

Identifiers

  • HAL Id : inria-00432533 , version 1

Cite

Jun Cai, Ghazi Bouselmi, Yves Laprie, Jean-Paul Haton. Efficient likelihood evaluation and dynamic Gaussian selection for HMM-based speech recognition. Computer Speech and Language, 2009, 23 (2), pp.147-256. ⟨inria-00432533⟩
195 View
0 Download

Share

Gmail Facebook X LinkedIn More