Skip to Main content Skip to Navigation
Conference papers

Lead2Gold: Towards exploiting the full potential of noisy transcriptions for speech recognition

Adrien Dufraux 1, 2 Emmanuel Vincent 2 Awni Hannun 3 Armelle Brun 4 Matthijs Douze 1
2 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
4 KIWI - Knowledge Information and Web Intelligence
LORIA - AIS - Department of Complex Systems, Artificial Intelligence & Robotics
Abstract : The transcriptions used to train an Automatic Speech Recognition (ASR) system may contain errors. Usually, either a quality control stage discards transcriptions with too many errors, or the noisy transcriptions are used as is. We introduce Lead2Gold, a method to train an ASR system that exploits the full potential of noisy transcriptions. Based on a noise model of transcription errors, Lead2Gold searches for better transcriptions of the training data with a beam search that takes this noise model into account. The beam search is differentiable and does not require a forced alignment step, thus the whole system is trained end-to-end. Lead2Gold can be viewed as a new loss function that can be used on top of any sequence-to-sequence deep neural network. We conduct proof-of-concept experiments on noisy transcriptions generated from letter corruptions with different noise levels. We show that Lead2Gold obtains a better ASR accuracy than a competitive baseline which does not account for the (artificially-introduced) transcription noise.
Complete list of metadatas

Cited literature [30 references]  Display  Hide  Download

https://hal.inria.fr/hal-02316572
Contributor : Adrien Dufraux <>
Submitted on : Tuesday, October 15, 2019 - 2:03:06 PM
Last modification on : Friday, October 18, 2019 - 1:25:27 PM
Document(s) archivé(s) le : Friday, January 17, 2020 - 8:40:10 AM

File

CAMERA_READY_ASRU_submission.p...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02316572, version 1

Citation

Adrien Dufraux, Emmanuel Vincent, Awni Hannun, Armelle Brun, Matthijs Douze. Lead2Gold: Towards exploiting the full potential of noisy transcriptions for speech recognition. ASRU 2019 - IEEE Automatic Speech Recognition and Understanding Workshop, Dec 2019, Singapour, Singapore. ⟨hal-02316572⟩

Share

Metrics

Record views

100

Files downloads

277