Statistical Language Model based on a Hierarchical Approach : MCnv

Imed Zitouni 1 Kamel Smaïli 1 Jean-Paul Haton 1
1 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : In this paper, we propose a new language model based on dependent word sequences organized in a multi-level hierarchy. We call this model MC$_{n}^{\nu}$, where $n$ is the maximum number of words in a sequence and $\nu$ is the maximum number of levels. The originality of this model is its capacity to take into account dependent variable-length sequences for very large vocabularies. In order to discover the variable-length sequences and to build the hierarchy, we use a set of $233$ syntactic classes extracted from the $8$ French elementary grammatical classes. The MC$_{n}^{\nu}$ model learns hierarchical word patterns and uses them to reevaluate and filter the n-best utterance hypotheses outputted by our speech recognizer MAUD. The model has been trained on a corpus of $43$ million words extracted from a French newspaper and uses a vocabulary of $20000$ words. Tests have been conducted on $300$ sentences. Results achieved $17\%$ decrease in perplexity compared to an interpolated class trigram model. Rescoring the original n-best hypotheses resulted in an improvement of $5\%$ in accuracy.
Type de document :
Communication dans un congrès
7th european conference on speech communication and technology - EUROSPEECH 2001, 2001, Aalborg, Denmark, 1, pp.29, 2001
Liste complète des métadonnées

https://hal.inria.fr/inria-00100677
Contributeur : Publications Loria <>
Soumis le : mardi 26 septembre 2006 - 14:49:02
Dernière modification le : jeudi 11 janvier 2018 - 06:19:55

Identifiants

  • HAL Id : inria-00100677, version 1

Collections

Citation

Imed Zitouni, Kamel Smaïli, Jean-Paul Haton. Statistical Language Model based on a Hierarchical Approach : MCnv. 7th european conference on speech communication and technology - EUROSPEECH 2001, 2001, Aalborg, Denmark, 1, pp.29, 2001. 〈inria-00100677〉

Partager

Métriques

Consultations de la notice

118