Modelling Semantic Context of OOV Words in Large Vocabulary Continuous Speech Recognition

Imran Ahamad Sheikh; Dominique Fohr; Irina Illina; Georges Linares

doi:10.1109/TASLP.2017.2651361

Article Dans Une Revue IEEE/ACM Transactions on Audio, Speech and Language Processing Année : 2017

Modelling Semantic Context of OOV Words in Large Vocabulary Continuous Speech Recognition

(1) , (1) , (1) , (2)

1
2

Imran Ahamad Sheikh

Fonction : Auteur
PersonId : 1000772

Speech Modeling for Facilitating Oral-Based Communication

Dominique Fohr

Fonction : Auteur
PersonId : 15652
IdHAL : dominique-fohr
IdRef : 031092942

Speech Modeling for Facilitating Oral-Based Communication

Irina Illina

Fonction : Auteur
PersonId : 15663
IdHAL : irina-illina
IdRef : 120731746

Speech Modeling for Facilitating Oral-Based Communication

Georges Linares

Fonction : Auteur
PersonId : 4977
IdHAL : georges-linares
IdRef : 079368794

Laboratoire Informatique d'Avignon

Résumé

The diachronic nature of broadcast news data leads to the problem of Out-Of-Vocabulary (OOV) words in Large Vocabulary Continuous Speech Recognition (LVCSR) systems. Analysis of OOV words reveals that a majority of them are Proper Names (PNs). However PNs are important for automatic indexing of audio-video content and for obtaining reliable automatic transcriptions. In this paper, we focus on the problem of OOV PNs in diachronic audio documents. To enable recovery of the PNs missed by the LVCSR system, relevant OOV PNs are retrieved by exploiting the semantic context of the LVCSR transcriptions. For retrieval of OOV PNs, we explore topic and semantic context derived from Latent Dirichlet Allocation (LDA) topic models, continuous word vector representations and the Neural Bag-of-Words (NBOW) model which is capable of learning task specific word and context representations. We propose a Neural Bag-of-Weighted Words (NBOW2) model which learns to assign higher weights to words that are important for retrieval of an OOV PN. With experiments on French broadcast news videos we show that the NBOW and NBOW2 models outperform the methods based on raw embeddings from LDA and Skip-gram models. Combining the NBOW and NBOW2 models gives a faster convergence during training. Second pass speech recognition experiments, in which the LVCSR vocabulary and language model are updated with the retrieved OOV PNs, demonstrate the effectiveness of the proposed context models.

Mots clés

large vocabulary continuous speech recognition out-of-vocabulary proper names semantic context

Domaines

Informatique et langage [cs.CL]

Fichier principal

draft.pdf (449.68 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Imran Sheikh : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01461617

Soumis le : mercredi 8 février 2017-12:11:23

Dernière modification le : lundi 11 septembre 2023-17:41:19

Archivage à long terme le : mardi 9 mai 2017-13:11:03

Dates et versions

hal-01461617 , version 1 (08-02-2017)

Identifiants

HAL Id : hal-01461617 , version 1
DOI : 10.1109/TASLP.2017.2651361

Citer

Imran Ahamad Sheikh, Dominique Fohr, Irina Illina, Georges Linares. Modelling Semantic Context of OOV Words in Large Vocabulary Continuous Speech Recognition. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2017, 25 (3), pp.598 - 610. ⟨10.1109/TASLP.2017.2651361⟩. ⟨hal-01461617⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-AVIGNON CNRS INRIA GRID5000 UNIV-LORRAINE INRIA2 LORIA LORIA-NLPKD LIA SILECS ANR

949 Consultations

590 Téléchargements

Modelling Semantic Context of OOV Words in Large Vocabulary Continuous Speech Recognition

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager