The second dihard diarization challenge: Dataset, task, and baselines, Proc. Interspeech, 2019. ,
Diarization is hard: Some experiences and lessons learned for the JHU team in the inaugural DIHARD challenge, Proc. Interspeech, 2018. ,
The fifth 'CHiME speech separation and recognition challenge: Dataset, task and baselines, Proc. Interspeech, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01744021
Second dihard challenge evaluation plan, Linguistic Data Consortium, Tech. Rep, 2019. ,
BUT system for DIHARD speech diarization challenge 2018, Proc. Interspeech, pp.2798-2802, 2018. ,
, Speech denoising with deep feature losses, 2018.
Deep residual learning for image recognition, Proceedings of the IEEE conference on computer vision and pattern recognition, pp.770-778, 2016. ,
Squeeze-and-excitation networks, Proc. IEEE CVPR, pp.7132-7141, 2018. ,
Adam: A method for stochastic optimization, 2014. ,
Weight normalization: A simple reparameterization to accelerate training of deep neural networks, Advances in Neural Information Processing Systems, pp.901-909, 2016. ,
Librispeech: an asr corpus based on public domain audio books, Proc. ICASSP, pp.5206-5210, 2015. ,
Building an open source automatic speech recognition system for catalan, Proc. IberSPEECH, pp.2018-2024, 2018. ,
ST-CMDS-20170001 1 -Free ST Chinese Mandarin Corpus ,
,
,
MUSAN: A Music, Speech, and Noise Corpus, 2015. ,
Pyroomacoustics: A python package for audio room simulation and array processing algorithms, Proc. ICASSP. IEEE, pp.351-355, 2018. ,
Minimum word error training of RNNbased voice activity detection, Proc. Interpseech, 2015. ,
, pyannote-audio: neural building blocks for speaker diarization
Improved closed set textindependent speaker identification by combining MFCC with evidence from flipped filter banks, International Journal of Signal Processing, vol.4, issue.2, pp.114-122, 2007. ,
Phoneme recognition using time-delay neural networks, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.37, issue.3, pp.328-339, 1989. ,
X-vectors: robust DNN embeddings for speaker recognition, Proc. ICASSP, pp.5329-5333, 2018. ,
Keras, 2015. ,
TensorFlow: large-scale machine learning on heterogeneous systems, 2015. ,
Rectified linear units improve restricted Boltzmann machines, Proc. ICML, pp.807-814, 2010. ,
Batch normalization: accelerating deep network training by reducing internal covariate shift, Proc. ICML, pp.448-456, 2015. ,
Understanding the difficulty of training deep feedforward neural networks, Proc. of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp.249-256, 2010. ,
Adam: a method for stochastic optimization, Proc. ICLR, pp.1-15, 2015. ,
Acoustic beamforming for speaker diarization of meetings, IEEE Transactions on Audio, Speech, and Language Processing, vol.15, issue.7, pp.2011-2021, 2007. ,
Speaker diarization with enhancing speech for the first dihard challenge, Proc. Interspeech, pp.2793-2797, 2018. ,
Multi-Channel Overlapped Speech Recognition with Location Guided Speech Extraction Network, 2018 IEEE Spoken Language Technology Workshop (SLT), pp.558-565, 2018. ,
Analyzing the impact of speaker localization errors on speech separation for automatic speech recognition, IEEE Automatic Speech Recognition and Understanding Workshop (Submitted), 2019. ,
The Generalized Correlation Method for Estimation of Time Delay, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.24, issue.4, pp.320-327, 1976. ,
Spatially pre-processed speech distortion weighted multi-channel Wiener filtering for noise reduction, Signal Processing, vol.84, issue.12, pp.2367-2387, 2004. ,
Deep clustering: Discriminative embeddings for segmentation and separation, Proc. ICASSP. IEEE, pp.31-35, 2016. ,
The EURECOM submission to the first DIHARD Challenge, Proc. INTERSPEECH, 2018. ,
Neural speech turn segmentation and affinity propagation for speaker diarization, Proc. Interspeech, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01912236