Skip to Main content Skip to Navigation
Conference papers

Variable-Length Class Sequences Based on a Hierarchical Approach: MCnv

Imed Zitouni 1 Kamel Smaïli 1 Jean-Paul Haton 1
1 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : In this paper, we describe a new language model based on dependent word sequences organized in multi-level hierarchy. We call this model MCnv, where n is the maximum number of words in a sequence and $\nu$ is the maximum number of levels. The originality of this model is its capability to take into account dependent variable-length sequences for very large vocabulary. In order to discover the variable-length sequences and to build the hierarchy, we use a set of 233 syntactic classes extracted from the eight French elementary grammatical classes. The MCnv model learns hierarchical word patterns and uses them to reevaluate and filter the n-best utterance hypotheses outputed by our speech recognizer MAUD. The model have been trained on a corpus (LeM) of 43 million of words extracted from ``Le Monde'' a French newspapers and uses a vocabulary of 20000 words. Tests have been conducted on 300 sentences. Results achieved 17% decrease in perplexity compared to an interpolated class trigram model. Rescoring the original n-best hypotheses results also in an improvement of 5% in accuracy.
Document type :
Conference papers
Complete list of metadata

Cited literature [18 references]  Display  Hide  Download
Contributor : Publications Loria Connect in order to contact the contributor
Submitted on : Tuesday, September 26, 2006 - 8:47:52 AM
Last modification on : Friday, March 26, 2021 - 11:53:22 AM
Long-term archiving on: : Wednesday, March 29, 2017 - 12:37:38 PM


  • HAL Id : inria-00099045, version 1



Imed Zitouni, Kamel Smaïli, Jean-Paul Haton. Variable-Length Class Sequences Based on a Hierarchical Approach: MCnv. SCI 2000 - 4th Word Multiconference on Systemics, Cybertinics & Informatics, Jul 2000, Orlando, United States. pp.6. ⟨inria-00099045⟩



Record views


Files downloads