Variable-Length Class Sequences Based on a Hierarchical Approach: MCnv - Archive ouverte HAL Access content directly
Conference Papers Year : 2000

Variable-Length Class Sequences Based on a Hierarchical Approach: MCnv

(1) , (1) , (1)
1

Abstract

In this paper, we describe a new language model based on dependent word sequences organized in multi-level hierarchy. We call this model MCnv, where n is the maximum number of words in a sequence and $\nu$ is the maximum number of levels. The originality of this model is its capability to take into account dependent variable-length sequences for very large vocabulary. In order to discover the variable-length sequences and to build the hierarchy, we use a set of 233 syntactic classes extracted from the eight French elementary grammatical classes. The MCnv model learns hierarchical word patterns and uses them to reevaluate and filter the n-best utterance hypotheses outputed by our speech recognizer MAUD. The model have been trained on a corpus (LeM) of 43 million of words extracted from ``Le Monde'' a French newspapers and uses a vocabulary of 20000 words. Tests have been conducted on 300 sentences. Results achieved 17% decrease in perplexity compared to an interpolated class trigram model. Rescoring the original n-best hypotheses results also in an improvement of 5% in accuracy.
Fichier principal
Vignette du fichier
A00-R-193.pdf (106.41 Ko) Télécharger le fichier
Loading...

Dates and versions

inria-00099045 , version 1 (26-09-2006)

Identifiers

  • HAL Id : inria-00099045 , version 1

Cite

Imed Zitouni, Kamel Smaïli, Jean-Paul Haton. Variable-Length Class Sequences Based on a Hierarchical Approach: MCnv. SCI 2000 - 4th Word Multiconference on Systemics, Cybertinics & Informatics, Jul 2000, Orlando, United States. pp.6. ⟨inria-00099045⟩
97 View
50 Download

Share

Gmail Facebook Twitter LinkedIn More