Efficient likelihood evaluation and dynamic Gaussian selection for HMM-based speech recognition

Jun Cai; Ghazi Bouselmi; Yves Laprie; Jean-Paul Haton

Journal Articles Computer Speech and Language Year : 2009

Efficient likelihood evaluation and dynamic Gaussian selection for HMM-based speech recognition

(1) , (1) , (1) , (1)

Jun Cai

Function : Author
PersonId : 854283

Analysis, perception and recognition of speech

Ghazi Bouselmi

Function : Author
PersonId : 836336

Analysis, perception and recognition of speech

Yves Laprie

Function : Author
PersonId : 6696
IdHAL : yves-laprie
ORCID : 0000-0002-2379-6481
IdRef : 060274387

Analysis, perception and recognition of speech

Jean-Paul Haton

Function : Author
PersonId : 830987

Analysis, perception and recognition of speech

Abstract

LVCSR systems are usually based on continuous density HMMs, which are typically implemented using Gaussian mixture distributions. Such statistical modeling systems tend to operate slower than real-time, largely because of the heavy computational overhead of the likelihood evaluation. The objective of our research is to investigate approximate methods that can substantially reduce the computational cost in likelihood evaluation without obviously degrading the recognition accuracy. In this paper, the most common techniques to speed up the likelihood computation are classified into three categories, namely machine optimization, model optimization, and algorithm optimization. Each category is surveyed and summarized by describing and analyzing the basic ideas of the corresponding techniques. The distribution of the numerical values of Gaussian mixtures within a GMM model are evaluated and analyzed to show that computations of some Gaussians are unnecessary and can thus be eliminated. Two commonly used techniques for likelihood approximation, namely VQ-based Gaussian selection and partial distance elimination, are analyzed in detail. Based on the analyses, a fast likelihood computation approach called dynamic Gaussian selection (DGS) is proposed. DGS approach is a one-pass search technique which generates a dynamic shortlist of Gaussians for each state during the procedure of likelihood computation. In principle, DGS is an extension of both techniques of partial distance elimination and best mixture prediction, and it does not require additional memory for the storage of Gaussian shortlists. DGS algorithm has been implemented by modifying the likelihood computation procedure in HTK 3.4 system. Experimental results on TIMIT and WSJ0 corpora indicate that this approach can speed up the likelihood computation significantly without introducing apparent additional recognition error.

Keywords

Hidden Markov Modeling Automatic Speech Recognition

Domains

Signal and Image Processing Signal and Image processing

Yves Laprie : Connect in order to contact the contributor

https://inria.hal.science/inria-00432533

Submitted on : Monday, November 16, 2009-3:50:50 PM

Last modification on : Friday, March 24, 2023-2:52:52 PM

Dates and versions

inria-00432533 , version 1 (16-11-2009)

Identifiers

HAL Id : inria-00432533 , version 1

Cite

Jun Cai, Ghazi Bouselmi, Yves Laprie, Jean-Paul Haton. Efficient likelihood evaluation and dynamic Gaussian selection for HMM-based speech recognition. Computer Speech and Language, 2009, 23 (2), pp.147-256. ⟨inria-00432533⟩

Export

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UNIV-LORRAINE INRIA2 LORIA

195 View

0 Download

Efficient likelihood evaluation and dynamic Gaussian selection for HMM-based speech recognition

Abstract

Keywords

Domains

Dates and versions

Identifiers

Cite

Export

Collections

Share