K. Yao, D. Yu, L. Deng, and Y. Gong, A fast maximum likelihood feature transformation method for GMM-HMM speaker adaptation, Neurocomputing, vol.128, pp.145-152, 2014.

T. Viglino, P. Motlicek, and M. Cernak, End-to-end accented speech recognition, pp.2140-2144, 2019.

Y. Zhao, J. Li, S. Zhang, L. Chen, and Y. Gong, Domain and speaker adaptation for Cortana speech recognition, IEEE International Conference on Acoustics, Speech and Signal Processing, pp.5984-5988, 2018.

D. Yu, K. Yao, H. Su, G. Li, and F. Seide, KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition, IEEE International Conference on Acoustics, Speech, and Signal Processing, pp.7893-7897, 2013.

J. Neto, L. Almeida, M. Hochberg, C. Martins, L. Nunes et al., Speaker-adaptation for hybrid HMM-ANN continuous speech recognition system, Eurospeech, pp.2171-2174, 1995.

K. Kumar, C. Liu, K. Yao, and Y. Gong, Intermediate-layer DNN adaptation for offline and session-based iterative speaker adaptation, pp.1091-1095, 2015.

S. M. Siniscalchi, J. Li, and C. Lee, Hermitian polynomial for speaker adaptation of connectionist speech recognition, IEEE Transactions on Audio, Speech, and Language Processing, vol.21, pp.2152-2161, 2013.

P. Swietojanski and S. Renals, Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models, IEEE Spoken Language Technology Workshop (SLT), pp.171-176, 2014.

M. Kitza, R. Schlüter, and H. Ney, Comparison of BLSTM layer specific affine transformations for speaker adaptation, Interspeech, pp.877-881, 2018.

N. Dehak, P. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, Front-end factor analysis for speaker verification, IEEE Transactions on Audio, Speech and Language Processing, vol.19, issue.4, pp.788-798, 2011.

G. Saon, H. Soltau, D. Nahamoo, and M. Picheny, Speaker adaptation of neural network acoustic models using i-vectors, IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp.55-59, 2013.

A. Senior and I. Lopez-moreno, Improving DNN speaker independence with i-vector inputs, IEEE International Conference on Acoustics, Speech and Signal Processing, pp.225-229, 2014.

X. Xie, X. Liu, T. Lee, and L. Wang, Fast DNN acoustic model adaptation by learning hidden unit contribution features, Interspeech, pp.759-763, 2019.

L. Samarakoon and K. C. Sim, Factorized hidden layer adaptation for deep neural network based acoustic modeling, IEEE Transactions on Audio, Speech, and Language Processing, vol.24, issue.12, pp.2241-2250, 2016.

T. Tan, Y. Qian, M. Yin, Y. Zhuang, and K. Yu, Cluster adaptive training for deep neural network, IEEE International Conference on Acoustics, Speech and Signal Processing, pp.4325-4329, 2015.

C. Wu and M. Gales, Multi-basis adaptive neural network for rapid adaptation in speech recognition, IEEE International Conference on Acoustics, Speech and Signal Processing, pp.4315-4319, 2015.

L. M. Tomokiyo and A. Waibel, Adaptation methods for nonnative speech, Multilinguality in Spoken Language Processing, 2001.

M. Elfeky, M. Bastani, X. Velez, P. Moreno, and A. Waters, Towards acoustic model unification across dialects, IEEE Spoken Language Technology Workshop (SLT), pp.624-628, 2016.

K. Rao and H. Sak, Multi-accent speech recognition with hierarchical grapheme based models, IEEE International Conference on Acoustics, Speech and Signal Processing, pp.4815-4819, 2017.

T. S. Nguyen, K. Kilgour, M. Sperber, and A. Waibel, Improved speaker adaptation by combining i-vector and fMLLR with deep bottleneck networks, International Conference on Speech and Computer (SPECOM, pp.417-426, 2017.

X. Yang, K. Audhkhasi, A. Rosenberg, S. Thomas, B. Ramabhadran et al., Joint modeling of accents and acoustics for multi-accent speech recognition, IEEE International Conference on Acoustics, Speech and Signal Processing, pp.1-5, 2018.

A. Jain, M. Upreti, and P. Jyothi, Improved accented speech recognition using accent embeddings and multi-task learning, in Interspeech, pp.2454-2458, 2018.

Y. Huang, D. Yu, C. Liu, and Y. Gong, Multi-accent deep neural network acoustic model with accent-specific top layer, Interspeech, pp.2977-2981, 2014.

M. Chen, Z. Yang, J. Liang, Y. Li, and W. Liu, Improving deep neural networks based multi-accent mandarin speech recognition using i-vectors and accent-specific top layer, 2015.

J. Huang, J. Li, D. Yu, L. Deng, and Y. Gong, Crosslanguage knowledge transfer using multilingual deep neural network with shared hidden layers, IEEE International Conference on Acoustics, Speech and Signal Processing, pp.7304-7308, 2013.

J. Yi, Z. Wen, J. Tao, H. Ni, and B. Liu, Ctc regularized model adaptation for improving lstm rnn based multi-accent mandarin speech recognition, Journal of Signal Processing Systems, vol.90, issue.7, pp.985-997, 2018.

D. Snyder, D. Garcia-romero, G. Sell, D. Povey, and S. Khudanpur, X-vectors: Robust DNN embeddings for speaker recognition, IEEE International Conference on Acoustics, Speech and Signal Processing, pp.5329-5333, 2018.

D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek et al., The Kaldi speech recognition toolkit, Tech. Rep, 2011.

A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, and K. Lang, Phoneme recognition using time-delay neural networks, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.37, issue.3, pp.328-339, 1989.

D. Povey, V. Peddinti, D. Galvez, P. Ghahremani, V. Manohar et al., Purely sequence-trained neural networks for ASR based on lattice-free MMI, pp.2751-2755, 2016.

V. Manohar, H. Hadian, D. Povey, and S. Khudanpur, Semisupervised training of acoustic models using lattice-free MMI, IEEE International Conference on Acoustics, Speech and Signal Processing, pp.4844-4848, 2018.

W. Hess, K. Kohler, and H. Tillmann, Phondat-Verbmobil speech corpus, European Conference on Speech Communication and Technology, 1995.

, Voxforge: an open and free speech corpus for speaker recognition, pp.2020-2023

S. Yoo, I. Song, and Y. Bengio, A highly adaptive acoustic model for accurate multi-dialect speech recognition, IEEE International Conference on Acoustics, Speech and Signal Processing, pp.5716-5720, 2019.

V. Peddinti, D. Povey, and S. Khudanpur, A time delay neural network architecture for efficient modeling of long temporal contexts, pp.3214-3218, 2015.

T. Ko, V. Peddinti, D. Povey, and S. Khudanpur, Audio augmentation for speech recognition, pp.3586-3589, 2015.

L. Maaten and G. Hinton, Visualizing data using t-SNE, Journal of Machine Learning Research, vol.9, pp.2579-2605, 2008.