Modelling Semantic Context of OOV Words in Large Vocabulary Continuous Speech Recognition

Imran Sheikh 1 Dominique Fohr 1 Irina Illina 1 Georges Linares 2
1 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : The diachronic nature of broadcast news data leads to the problem of Out-Of-Vocabulary (OOV) words in Large Vocabulary Continuous Speech Recognition (LVCSR) systems. Analysis of OOV words reveals that a majority of them are Proper Names (PNs). However PNs are important for automatic indexing of audio-video content and for obtaining reliable automatic transcriptions. In this paper, we focus on the problem of OOV PNs in diachronic audio documents. To enable recovery of the PNs missed by the LVCSR system, relevant OOV PNs are retrieved by exploiting the semantic context of the LVCSR transcriptions. For retrieval of OOV PNs, we explore topic and semantic context derived from Latent Dirichlet Allocation (LDA) topic models, continuous word vector representations and the Neural Bag-of-Words (NBOW) model which is capable of learning task specific word and context representations. We propose a Neural Bag-of-Weighted Words (NBOW2) model which learns to assign higher weights to words that are important for retrieval of an OOV PN. With experiments on French broadcast news videos we show that the NBOW and NBOW2 models outperform the methods based on raw embeddings from LDA and Skip-gram models. Combining the NBOW and NBOW2 models gives a faster convergence during training. Second pass speech recognition experiments, in which the LVCSR vocabulary and language model are updated with the retrieved OOV PNs, demonstrate the effectiveness of the proposed context models.
Type de document :
Article dans une revue
IEEE/ACM Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, 2017, 25 (3), pp.598 - 610. 〈10.1109/TASLP.2017.2651361〉
Liste complète des métadonnées

Littérature citée [76 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01461617
Contributeur : Imran Sheikh <>
Soumis le : mercredi 8 février 2017 - 12:11:23
Dernière modification le : vendredi 26 janvier 2018 - 10:47:07
Document(s) archivé(s) le : mardi 9 mai 2017 - 13:11:03

Fichier

draft.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Imran Sheikh, Dominique Fohr, Irina Illina, Georges Linares. Modelling Semantic Context of OOV Words in Large Vocabulary Continuous Speech Recognition. IEEE/ACM Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, 2017, 25 (3), pp.598 - 610. 〈10.1109/TASLP.2017.2651361〉. 〈hal-01461617〉

Partager

Métriques

Consultations de la notice

734

Téléchargements de fichiers

270