Training RNN Language Models on Uncertain ASR Hypotheses in Limited Data Scenarios

Imran Ahamad Sheikh; Emmanuel Vincent; Irina Illina

doi:10.1016/j.csl.2023.101555

Article Dans Une Revue Computer Speech and Language Année : 2024

Training RNN Language Models on Uncertain ASR Hypotheses in Limited Data Scenarios

(1) , (1) , (1)

Imran Ahamad Sheikh

Fonction : Auteur
PersonId : 1000772

Speech Modeling for Facilitating Oral-Based Communication

Emmanuel Vincent

Fonction : Auteur
PersonId : 1256
IdHAL : emmanuelv
ORCID : 0000-0002-0183-7289
IdRef : 089360176

Speech Modeling for Facilitating Oral-Based Communication

Irina Illina

Fonction : Auteur
PersonId : 15663
IdHAL : irina-illina
IdRef : 120731746

Speech Modeling for Facilitating Oral-Based Communication

Résumé

Training domain-specific automatic speech recognition (ASR) systems requires a suitable amount of data comprising the target domain. In several scenarios, such as early development stages, privacy-critical applications, or under-resourced languages, only a limited amount of in-domain speech data and an even smaller amount of manual text transcriptions, if any, are available. This motivates the study of ASR language model (LM) training on a limited amount of in-domain speech data. Early works have attempted training of n-gram LMs from ASR N-best lists and lattices but training and adaptation of recurrent neural network (RNN) LMs from ASR transcripts has not received attention. In this work, we study training and adaptation of RNN LMs using alternate, uncertain ASR hypotheses embedded in ASR confusion networks obtained from target domain speech data. We explore different methods for training the RNN LMs to deal with the uncertain input sequences. The first method extends the cross-entropy objective into a Kullback–Leibler (KL) divergence based training loss, the second method formulates a training loss based on a hidden Markov model (HMM), and the third method performs training on paths sampled from the confusion networks. These methods are applied to limited data setups including telephone and meeting conversation datasets. Performance is evaluated under two settings wherein no manual transcriptions or a small amount of manual transcriptions are available to aid the training. Moreover, a model adaptation setting is also evaluated wherein the RNN LM is pre-trained on an out-of-domain conversational corpus. Overall the sampling method for training RNN LMs on ASR confusion networks performs the best, and results in up to 12% relative reduction in perplexity on the meeting dataset as compared to training on ASR 1-best hypotheses, without any manual transcriptions. However, the perplexity reductions do not translate into equivalent WER reductions. A detailed analysis of the perplexity reductions obtained by the different methods is performed in order to understand this effect.

Mots clés

automatic speech recognition language models recurrent neural networks confusion networks

Domaines

Informatique et langage [cs.CL] Apprentissage [cs.LG]

Fichier principal

hal-03327306.pdf (451.76 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Imran Sheikh : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-03327306

Soumis le : lundi 21 août 2023-16:29:02

Dernière modification le : mercredi 3 avril 2024-14:31:42

Dates et versions

hal-03327306 , version 1 (27-08-2021)

hal-03327306 , version 2 (21-08-2023)

Licence

Paternité

Identifiants

HAL Id : hal-03327306 , version 2
DOI : 10.1016/j.csl.2023.101555

Citer

Imran Ahamad Sheikh, Emmanuel Vincent, Irina Illina. Training RNN Language Models on Uncertain ASR Hypotheses in Limited Data Scenarios. Computer Speech and Language, 2024, 83, pp.101555. ⟨10.1016/j.csl.2023.101555⟩. ⟨hal-03327306v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA GRID5000 UNIV-LORRAINE INRIA2 LORIA LORIA-NLPKD SILECS

314 Consultations

290 Téléchargements

Training RNN Language Models on Uncertain ASR Hypotheses in Limited Data Scenarios

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager