Skip to Main content Skip to Navigation
Conference papers

Achieving multi-accent ASR via unsupervised acoustic model adaptation

M. A. Tuğtekin Turan 1 Emmanuel Vincent 1 Denis Jouvet 1
1 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : Current automatic speech recognition (ASR) systems trained on native speech often perform poorly when applied to non-native or accented speech. In this work, we propose to compute x-vector-like accent embeddings and use them as auxiliary inputs to an acoustic model trained on native data only in order to improve the recognition of multi-accent data comprising native, non-native, and accented speech. In addition, we leverage untranscribed accented training data by means of semi-supervised learning. Our experiments show that acoustic models trained with the proposed accent embeddings outperform those trained with conventional i-vector or x-vector speaker embeddings, and achieve a 15% relative word error rate (WER) reduction on non-native and accented speech w.r.t. acoustic models trained with regular spectral features only. Semi-supervised training using just 1 hour of untranscribed speech per accent yields an additional 15% relative WER reduction w.r.t. models trained on native data only.
Complete list of metadatas

Cited literature [37 references]  Display  Hide  Download

https://hal.inria.fr/hal-02907929
Contributor : Emmanuel Vincent <>
Submitted on : Sunday, August 2, 2020 - 2:28:22 AM
Last modification on : Monday, August 3, 2020 - 10:34:15 AM

File

cameraReady_2742.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02907929, version 1

Collections

Citation

M. A. Tuğtekin Turan, Emmanuel Vincent, Denis Jouvet. Achieving multi-accent ASR via unsupervised acoustic model adaptation. INTERSPEECH 2020, Oct 2020, Shanghai, China. ⟨hal-02907929⟩

Share

Metrics

Record views

76

Files downloads

68