# Variable-Length Class Sequences Based on a Hierarchical Approach: MCnv

1 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : In this paper, we describe a new language model based on dependent word sequences organized in multi-level hierarchy. We call this model MCnv, where n is the maximum number of words in a sequence and $\nu$ is the maximum number of levels. The originality of this model is its capability to take into account dependent variable-length sequences for very large vocabulary. In order to discover the variable-length sequences and to build the hierarchy, we use a set of 233 syntactic classes extracted from the eight French elementary grammatical classes. The MCnv model learns hierarchical word patterns and uses them to reevaluate and filter the n-best utterance hypotheses outputed by our speech recognizer MAUD. The model have been trained on a corpus (LeM) of 43 million of words extracted from Le Monde'' a French newspapers and uses a vocabulary of 20000 words. Tests have been conducted on 300 sentences. Results achieved 17% decrease in perplexity compared to an interpolated class trigram model. Rescoring the original n-best hypotheses results also in an improvement of 5% in accuracy.
Mots-clés :
Type de document :
Communication dans un congrès
4th Word Multiconference on Systemics, Cybertinics & Informatics, 2000, Orlando, USA, 6 p, 2000
Domaine :

Littérature citée [18 références]

https://hal.inria.fr/inria-00099045
Contributeur : Publications Loria <>
Soumis le : mardi 26 septembre 2006 - 08:47:52
Dernière modification le : jeudi 11 janvier 2018 - 06:19:57
Document(s) archivé(s) le : mercredi 29 mars 2017 - 12:37:38

### Identifiants

• HAL Id : inria-00099045, version 1

### Citation

Imed Zitouni, Kamel Smaïli, Jean-Paul Haton. Variable-Length Class Sequences Based on a Hierarchical Approach: MCnv. 4th Word Multiconference on Systemics, Cybertinics & Informatics, 2000, Orlando, USA, 6 p, 2000. 〈inria-00099045〉

### Métriques

Consultations de la notice

## 142

Téléchargements de fichiers