Adding new words into a language model using parameters of known words with similar behavior

Luiza Orosanu 1 Denis Jouvet 1
1 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : This article presents a study on how to automatically add new words into a language model without retraining it or adapting it (which requires a lot of new data). The proposed approach consists in finding a list of similar words for each new word to be added in the language model. Based on a small set of sentences containing the new words and on a set of n-gram counts containing the known words, we search for known words which have the most similar neighbor distribution (of the few preceding and few following neighbor words) to the new words. The similar words are determined through the computation of KL divergences on the distribution of neighbor words. The n-gram parameter values associated to the similar words are then used to define the n-gram parameter values of the new words. In the context of speech recognition, the performance assessment on a LVCSR task shows the benefit of the proposed approach.
Type de document :
Communication dans un congrès
International Conference on Natural Language and Speech Processing, Oct 2015, Alger, Algeria. Proceedings ICNLSP'2015, International Conference on Natural Language and Speech Processing
Liste complète des métadonnées

Littérature citée [25 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01184194
Contributeur : Denis Jouvet <>
Soumis le : jeudi 13 août 2015 - 11:22:25
Dernière modification le : jeudi 11 janvier 2018 - 06:27:31
Document(s) archivé(s) le : samedi 14 novembre 2015 - 10:15:31

Fichier

articleICNLSP2015-NW-final.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01184194, version 1

Collections

Citation

Luiza Orosanu, Denis Jouvet. Adding new words into a language model using parameters of known words with similar behavior. International Conference on Natural Language and Speech Processing, Oct 2015, Alger, Algeria. Proceedings ICNLSP'2015, International Conference on Natural Language and Speech Processing. 〈hal-01184194〉

Partager

Métriques

Consultations de la notice

387

Téléchargements de fichiers

314