V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, Librispeech: an ASR corpus based on public domain audio books, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.5206-5210, 2015.

B. Douglas, J. M. Paul, and . Baker, The design for the Wall Street Journal-based CSR corpus, Proceedings of the workshop on Speech and Natural Language. Association for Computational Linguistics, pp.357-362, 1992.

C. Cieri, D. Miller, and K. Walker, The fisher corpus: a resource for the next generations of speech-to-text, LREC, vol.4, pp.69-71, 2004.

J. John, . Godfrey, C. Edward, J. Holliman, and . Mcdaniel, Switchboard: Telephone speech corpus for research and development, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol.1, pp.517-520, 1992.

K. Vesel?, M. Hannemann, and L. Burget, Semi-supervised training of deep neural networks, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, pp.267-272, 2013.

S. Karita, S. Watanabe, and T. Iwata, Semi-supervised end-to-end speech recognition.," in Interspeech, pp.2-6, 2018.

J. Huang, R. Child, V. Rao, H. Liu, S. Satheesh et al., Active learning for speech recognition: the power of gradients, 2016.

T. Drugman, J. Pylkkonen, and R. Kneser, Active and semi-supervised learning in ASR: Benefits on the acoustic and language models, 2019.

D. Yu, B. Varadarajan, L. Deng, and A. Acero, Active learning and semi-supervised learning for speech recognition: A unified framework using the global entropy reduction maximization criterion, Computer Speech & Language, vol.24, pp.433-444, 2010.

D. Wang and T. Zheng, Transfer learning for speech and language processing, Proc. AP-SIPA Annual Summit and Conf, pp.1225-1237, 2015.

R. Collobert, C. Puhrsch, and G. Synnaeve, Wav2letter: an end-to-end convnetbased speech recognition system, 2016.

G. Patrini, A. Rozza, A. K. Menon, R. Nock, and L. Qu, Making deep neural networks robust to label noise: A loss correction approach, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.1944-1952, 2017.

T. Kaneko, Y. Ushiku, and T. Harada, Label-noise robust generative adversarial networks, CoRR, 2018.

S. Sukhbaatar, J. Bruna, M. Paluri, L. Bourdev, and R. Fergus, Training convolutional networks with noisy labels, 2014.

I. Jindal, M. Nokleby, and X. Chen, Learning deep networks from noisy labels with dropout regularization, 2016 IEEE 16th International Conference on Data Mining (ICDM), pp.967-972, 2016.

J. Goldberger and E. Ben-reuven, Training deep neural-networks using a noise adaptation layer, ICLR, 2017.

T. Xiao, T. Xia, Y. Yang, C. Huang, and X. Wang, Learning from massive noisy labeled data for image classification, Proceedings of the IEEE conference on computer vision and pattern recognition, pp.2691-2699, 2015.

A. K. Menon, B. Van-rooyen, and N. Natarajan, Learning from binary labels with instance-dependent corruption, 2016.

J. Guo, N. Tara, R. J. Sainath, and . Weiss, A spelling correction model for end-to-end speech recognition, ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing

, IEEE, pp.5651-5655, 2019.

A. Mark, P. Hasegawa-johnson, D. Jyothi, M. Mccloy, G. M. Mirbagheri et al., ASR for under-resourced languages from probabilistic transcription, IEEE/ACM Transactions on Audio, Speech and Language Processing, vol.25, issue.1, pp.50-63, 2017.

A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks, Proceedings of the 23rd international conference on Machine learning, pp.369-376, 2006.

D. Povey, V. Peddinti, D. Galvez, P. Ghahremani, V. Manohar et al., Purely sequencetrained neural networks for asr based on lattice-free mmi, pp.2751-2755, 2016.

R. Collobert, A. Hannun, and G. Synnaeve, A fully differentiable beam search decoder, 2019.

V. Pratap, A. Hannun, Q. Xu, J. Cai, J. Kahn et al., Wav2letter++: The fastest opensource speech recognition system, 2018.

V. Liptchinsky, G. Synnaeve, and R. Collobert, Letter-based speech recognition with gated convnets, 2017.

T. Salimans, P. Durk, and . Kingma, Weight normalization: A simple reparameterization to accelerate training of deep neural networks, Advances in Neural Information Processing Systems, pp.901-909, 2016.

A. Yann-n-dauphin, M. Fan, D. Auli, and . Grangier, Language modeling with gated convolutional networks, Proceedings of the 34th International Conference on Machine Learning, vol.70, pp.933-941, 2017.

C. Lüscher, E. Beck, K. Irie, M. Kitza, W. Michel et al., RWTH ASR systems for librispeech: Hybrid vs attention-w/o data augmentation, 2019.

J. Billa, Improving LSTM-CTC based ASR performance in domains with limited training data, 2017.