J. Y. Kim, C. Liu, R. A. Calvo, K. Mccabe, S. C. Taylor et al., A comparison of online automatic speech recognition systems and the nonverbal responses to unintelligible speech, 2019.

B. Rizk, Evaluation of state of art open-source ASR engines with local inferencing, 2019.

A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks, 23rd International Conference on Machine Learning (ICML), pp.369-376, 2006.

D. Povey, V. Peddinti, D. Galvez, P. Ghahremani, V. Manohar et al., Purely sequence-trained neural networks for ASR based on lattice-free MMI, pp.2751-2755, 2016.

H. Hadian, H. Sameti, D. Povey, and S. Khudanpur, End-to-end speech recognition using lattice-free MMI, in Interspeech, pp.12-16, 2018.

V. Manohar, H. Hadian, D. Povey, and S. Khudanpur, Semisupervised training of acoustic models using lattice-free MMI, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.4844-4848, 2018.

P. Ghahremani, V. Manohar, H. Hadian, D. Povey, and S. Khudanpur, Investigation of transfer learning for ASR using LF-MMI trained neural networks, 2017 IEEE Automatic Speech Recognition and Understanding Workshop, pp.279-286, 2017.

T. Lo and B. Chen, Semi-supervised training of acoustic models leveraging knowledge transferred from out-of-domain data, 2019 APSIPA Annual Summit and Conference, pp.1400-1404, 2019.

A. Carmantini, P. Bell, and S. Renals, Untranscribed web audio for low resource speech recognition, pp.226-230, 2019.

J. Fainberg, O. Klejch, S. Renals, and P. Bell, Lattice-based lightly-supervised acoustic model training, pp.1596-1600, 2019.

S. Tong, A. Vyas, P. N. Garner, and H. Bourlard, Unbiased semi-supervised LF-MMI training using dropout, pp.1576-1580, 2019.

A. Vyas, P. Dighe, S. Tong, and H. Bourlard, Analyzing uncertainties in speech recognition using dropout, 2019 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.6730-6734, 2019.

Y. Tam, Y. Lei, J. Zheng, and W. Wang, ASR error detection using recurrent neural network language model and complementary ASR, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.2312-2316, 2014.

R. Errattahi, A. E. Hannani, F. Z. Salmam, and H. Ouahmane, Incorporating label dependency for ASR error detection via RNN, Procedia Computer Science, vol.148, pp.266-272, 2019.

K. Lybarger, M. Ostendorf, and M. Yetisgen, Automatically detecting likely edits in clinical notes created using automatic speech recognition, AMIA Annual Symposium, 2017.

S. Thomas, M. L. Seltzer, K. Church, and H. Hermansky, Deep neural network features and semi-supervised training for low resource speech recognition, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.6704-6708, 2013.

K. Veselý, L. Burget, and J. ?ernocký, Semi-supervised DNN training with word selection for ASR, pp.3687-3691, 2017.

F. Grezl and M. Karafiat, Semi-supervised bootstrapping approach for neural network feature extractor training, 2013 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp.470-475, 2013.

P. Zhang, Y. Liu, and T. Hain, Semi-supervised DNN training in meeting recognition, 2014 IEEE Spoken Language Technology Workshop, pp.141-146, 2014.

H. Xu, D. Povey, L. Mangu, and J. Zhu, Minimum Bayes risk decoding and system combination based on a recursion for edit distance, Computer Speech and Language, vol.25, issue.4, pp.802-828, 2011.

V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, Librispeech: An ASR corpus based on public domain audio books, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.5206-5210, 2015.

, Mozilla common voice

S. Burger, K. Weilhammer, F. Schiel, and H. G. Tillmann, Verbmobil data collection and annotation, Verbmobil: Foundations of Speech-to-Speech Translation, W. Wahlster, pp.537-549, 2000.

G. Parent and M. Eskenazi, Toward better crowdsourced transcription: Transcription of a year of the Let's Go bus information system data, 2010 IEEE Spoken Language Technology Workshop (SLT), pp.312-317, 2010.

. Dailrc, The integral LET'S GO! dataset. Last accessed, 2020.

G. Saon, H. Soltau, D. Nahamoo, and M. Picheny, Speaker adaptation of neural network acoustic models using i-vectors, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp.55-59, 2013.

, Kaldi-help: train more2 in chain / nnet3 scenario. Last accessed, 2020.