Variable-Length Class Sequences Based on a Hierarchical Approach: MCnv - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2000

Variable-Length Class Sequences Based on a Hierarchical Approach: MCnv

Résumé

In this paper, we describe a new language model based on dependent word sequences organized in multi-level hierarchy. We call this model MCnv, where n is the maximum number of words in a sequence and $\nu$ is the maximum number of levels. The originality of this model is its capability to take into account dependent variable-length sequences for very large vocabulary. In order to discover the variable-length sequences and to build the hierarchy, we use a set of 233 syntactic classes extracted from the eight French elementary grammatical classes. The MCnv model learns hierarchical word patterns and uses them to reevaluate and filter the n-best utterance hypotheses outputed by our speech recognizer MAUD. The model have been trained on a corpus (LeM) of 43 million of words extracted from ``Le Monde'' a French newspapers and uses a vocabulary of 20000 words. Tests have been conducted on 300 sentences. Results achieved 17% decrease in perplexity compared to an interpolated class trigram model. Rescoring the original n-best hypotheses results also in an improvement of 5% in accuracy.
Fichier principal
Vignette du fichier
A00-R-193.pdf (106.41 Ko) Télécharger le fichier
Loading...

Dates et versions

inria-00099045 , version 1 (26-09-2006)

Identifiants

  • HAL Id : inria-00099045 , version 1

Citer

Imed Zitouni, Kamel Smaïli, Jean-Paul Haton. Variable-Length Class Sequences Based on a Hierarchical Approach: MCnv. SCI 2000 - 4th Word Multiconference on Systemics, Cybertinics & Informatics, Jul 2000, Orlando, United States. pp.6. ⟨inria-00099045⟩
102 Consultations
66 Téléchargements

Partager

Gmail Facebook X LinkedIn More