Discriminative importance weighting of augmented training data for acoustic model training

Sunit Sivasankaran 1 Emmanuel Vincent 1 Irina Illina 1
1 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : DNN based acoustic models require a large amount of training data. Parametric data augmentation techniques such as adding noise, reverberation, or changing the speech rate, are often employed to boost the dataset size and the ASR performance. The choice of augmentation techniques and the associated parameters has been handled heuristically so far. In this work we propose an algorithm to automatically weight data perturbed using a variety of augmentation techniques and/or parameters. The weights are learned in a discriminative fashion so as to minimize the frame error rate using the standard gradient descent algorithm in an iterative manner. Experiments were performed using the CHiME-3 dataset. Data augmentation was done by adding noise at different SNRs. A relative WER improvement of 15% was obtained with the proposed data weighting algorithm compared to the unweighted augmented dataset. Interestingly, the resulting distribution of SNRs in the weighted training set differs significantly from that of the test set.
Document type :
Conference papers
Complete list of metadatas

Cited literature [31 references]  Display  Hide  Download

https://hal.inria.fr/hal-01415759
Contributor : Emmanuel Vincent <>
Submitted on : Monday, March 6, 2017 - 1:35:30 PM
Last modification on : Wednesday, April 3, 2019 - 1:22:54 AM
Long-term archiving on : Wednesday, June 7, 2017 - 1:54:19 PM

File

sivasankaran_ICASSP17.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01415759, version 2

Citation

Sunit Sivasankaran, Emmanuel Vincent, Irina Illina. Discriminative importance weighting of augmented training data for acoustic model training. 42th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2017), Mar 2017, New Orleans, United States. ⟨hal-01415759v2⟩

Share

Metrics

Record views

739

Files downloads

513