BERT-based Semantic Model for Rescoring N-best Speech Recognition List - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2021

BERT-based Semantic Model for Rescoring N-best Speech Recognition List

Résumé

This work aims to improve automatic speech recognition (ASR) by modeling long-term semantic relations. We propose to perform this through rescoring the ASR N-best hypotheses list. To achieve this, we propose two deep neural network (DNN) models and combine semantic, acoustic, and linguistic information. Our DNN rescoring models are aimed at selecting hypotheses that have better semantic consistency and therefore lower WER. We investigate a powerful representation as part of input features to our DNN model: dynamic contextual embeddings from Transformer-based BERT. Acoustic and linguistic features are also included. We perform experiments on the publicly available dataset TED-LIUM. We evaluate in clean and in noisy conditions, with n-gram and Recurrent Neural Network Language Model (RNNLM), more precisely Long Short-Term Memory (LSTM) model. The proposed rescoring approaches give significant WER improvements over the ASR system without rescoring models. Furthermore, the combination of rescoring methods based on BERT and GPT-2 scores achieves the best results.
Fichier principal
Vignette du fichier
Interspeech_Bertsem_v13.pdf (767.38 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03248881 , version 1 (03-06-2021)

Identifiants

Citer

Dominique Fohr, Irina Illina. BERT-based Semantic Model for Rescoring N-best Speech Recognition List. INTERSPEECH 2021, Aug 2021, Brno, Czech Republic. ⟨10.21437/Interspeech.2021-313⟩. ⟨hal-03248881⟩
411 Consultations
571 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More