, Speech enhancement, 1983.
, Speech enhancement, 2006.
, Speech enhancement: theory and practice, 2007.
Visual contribution to speech intelligibility in noise, The Journal of the Acoustical Society of America, vol.26, issue.2, pp.212-215, 1954. ,
Auditory-visual perception of speech, Journal of Speech and Hearing Disorders, vol.40, issue.4, pp.481-492, 1975. ,
Quantifying the contribution of vision to speech perception in noise, British Journal of Audiology, vol.21, issue.2, pp.131-141, 1987. ,
Noisy speech enhancement with filters estimated from the speaker's lips, Proc. European Conference on Speech Communication and Technology, pp.1559-1562, 1995. ,
Audio-visual enhancement of speech in noise, The Journal of the Acoustical Society of America, vol.109, issue.6, pp.3007-3020, 2001. ,
Learning joint statistical models for audio-visual fusion and segregation, Proc. Advances in Neural Information Processing Systems (NIPS), pp.772-778, 2001. ,
Audiovisual speech enhancement with AVCDCN (audio-visual codebook dependent cepstral normalization), Proc. IEEE International Workshop on Sensor Array and Multichannel Signal Processing, pp.68-71, 2002. ,
Noisy audio feature enhancement using audio-visual speech data, Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp.2025-2028, 2002. ,
Audio-visual sound separation via hidden Markov models, Proc. Advances in Neural Information Processing Systems (NIPS), pp.1173-1180, 2002. ,
Twin-HMM-based audio-visual speech enhancement, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, pp.3726-3730, 2013. ,
The conversation: Deep audio-visual speech enhancement, Proc. Conference of the International Speech Communication Association (INTER-SPEECH), pp.3244-3248, 2018. ,
Visual speech enhancement, Proc. Conference of the International Speech Communication Association (INTERSPEECH), pp.1170-1174, 2018. ,
Seeing through noise: Speaker separation and enhancement using visuallyderived speech, Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, pp.3051-3055, 2018. ,
Audio-visual speech enhancement using multimodal deep convolutional neural networks, IEEE Transactions on Emerging Topics in Computational Intelligence, vol.2, issue.2, pp.117-128, 2018. ,
DNN driven speaker independent audio-visual mask estimation for speech separation, Proc. Conference of the International Speech Communication Association (INTERSPEECH), pp.2723-2727, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01868604
Statistical speech enhancement based on probabilistic integration of variational autoencoder and non-negative matrix factorization, Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, pp.716-720, 2018. ,
A variance modeling framework based on variational autoencoders for speech enhancement, Proc. IEEE International Workshop on Machine Learning for Signal Processing, pp.1-6, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01832826
Bayesian multichannel speech enhancement with a deep speech prior, Proc. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, pp.1233-1239, 2018. ,
Semi-supervised multichannel speech enhancement with variational autoencoders and non-negative matrix factorization, Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, pp.101-105, 2019. ,
URL : https://hal.archives-ouvertes.fr/hal-02005102
Speech enhancement with variational autoencoders and alpha-stable distributions, Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, pp.541-545, 2019. ,
URL : https://hal.archives-ouvertes.fr/hal-02005106
A statistically principled and computationally efficient approach to speech enhancement using variational autoencoders, Proc. Conference of the International Speech Communication Association (INTERSPEECH), 2019. ,
URL : https://hal.archives-ouvertes.fr/hal-02089062
Learning structured output representation using deep conditional generative models, Proc. Advances in Neural Information Processing Systems (NIPS), pp.3483-3491, 2015. ,
Multichannel speech enhancement based on time-frequency masking using subband long short-term memory, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp.298-302, 2019. ,
URL : https://hal.archives-ouvertes.fr/hal-02264247
NTCD-TIMIT: A new database and baseline for noise-robust audio-visual speech recognition, Proc. Conference of the International Speech Communication Association (INTERSPEECH, pp.3752-3756, 2017. ,
An audiovisual corpus for speech perception and automatic speech recognition, J. Acoustical Society of America, vol.120, issue.5, pp.2421-2424, 2006. ,
Suppression of acoustic noise in speech using spectral subtraction, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.27, issue.2, pp.113-120, 1979. ,
Enhancement and bandwidth compression of noisy speech, Proceedings of the IEEE, vol.67, issue.12, pp.1586-1604, 1979. ,
Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.32, issue.6, pp.1109-1121, 1984. ,
Speech enhancement based on minimum mean-square error estimation and supergaussian priors, IEEE Transactions on Speech and Audio Processing, vol.13, issue.5, pp.845-856, 2005. ,
Minimum mean-square error estimation of discrete Fourier coefficients with generalized Gamma priors, IEEE Transactions on Audio, Speech, and Language Processing, vol.15, issue.6, pp.1741-1752, 2007. ,
Speech enhancement using a minimum mean-square error log-spectral amplitude estimator, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.33, issue.2, pp.443-445, 1985. ,
Speech enhancement for nonstationary noise environments, Signal processing, vol.81, issue.11, pp.2403-2418, 2001. ,
Nonnegative matrix factorization with the Itakura-Saito divergence: With application to music analysis, Neural computation, vol.21, issue.3, pp.793-830, 2009. ,
Speech denoising using nonnegative matrix factorization with priors, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.4029-4032, 2008. ,
Phoneme-dependent NMF for speech enhancement in monaural mixtures, Proc. Conference of the International Speech Communication Association (INTER-SPEECH), pp.1217-1220, 2011. ,
Supervised and unsupervised speech enhancement using nonnegative matrix factorization, IEEE Transactions on Audio, Speech, and Language Processing, vol.21, issue.10, pp.2140-2151, 2013. ,
Supervised speech separation based on deep learning: An overview, IEEE Transactions on Audio, Speech, and Language Processing, vol.26, issue.10, pp.1702-1726, 2018. ,
Speech enhancement based on deep denoising autoencoder, Proc. Conference of the International Speech Communication Association, pp.436-440, 2013. ,
A regression approach to speech enhancement based on deep neural networks, IEEE Transactions on Audio, Speech, and Language Processing, vol.23, issue.1, pp.7-19, 2015. ,
SNR-aware convolutional neural network modeling for speech enhancement, Proc. Conference of the International Speech Communication Association (INTERSPEECH), pp.3768-3772, 2016. ,
Towards scaling up classificationbased speech separation, IEEE Transactions on Audio, Speech, and Language Processing, vol.21, issue.7, pp.1381-1390, 2013. ,
On training targets for supervised speech separation, IEEE/ACM transactions on audio, vol.22, issue.12, pp.1849-1858, 2014. ,
Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR, Proc. International Conference on Latent Variable Analysis and Signal Separation, pp.91-99, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01163493
Semi-supervised learning with deep generative models, Adv. Neural Information Processing Systems (NIPS), pp.3581-3589, 2014. ,
Supervised determined source separation with multichannel variational autoencoder, Neural Computation, vol.31, issue.9, pp.1-24, 2019. ,
Fast MVAE: Joint separation and classification of mixed sources based on multichannel variational autoencoder with auxiliary classifier, Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, pp.546-550, 2019. ,
Joint separation and dereverberation of reverberant mixtures with multichannel variational autoencoder, Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, pp.96-100, 2019. ,
Visually derived wiener filters for speech enhancement, IEEE Transactions on Audio, Speech, and Language Processing, vol.19, issue.6, pp.1642-1651, 2010. ,
A Monte Carlo implementation of the EM algorithm and the poor man's data augmentation algorithms, Journal of the American statistical Association, vol.85, issue.411, pp.699-704, 1990. ,
An introduction to variational methods for graphical models, Machine learning, vol.37, issue.2, pp.183-233, 1999. ,
Variational inference: A review for statisticians, Journal of the American Statistical Association, vol.112, issue.518, pp.859-877, 2017. ,
End-to-end audiovisual speech recognition, Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, pp.6548-6552, 2018. ,
?-vae: Learning basic visual concepts with a constrained variational framework, International Conference on Learning Representations (ICLR, 2017. ,
Maximum likelihood from incomplete data via the EM algorithm, Journal of the royal statistical society. Series B (Methodological), vol.39, issue.1, pp.1-38, 1977. ,
, Monte Carlo Statistical Methods, 2005.
Algorithms for nonnegative matrix factorization with the ?-divergence, Neural computation, vol.23, issue.9, pp.2421-2456, 2011. ,
TIMIT acoustic phonetic continuous speech corpus, Linguistic data consortium, 1993. ,
FaNT-filtering and noise adding tool, 2005. ,
The Diverse Environments Multi-channel Acoustic Noise Database (DEMAND): A database of multichannel environmental noise recordings, Proc. International Congress on Acoustics, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00796707
Supervised and semi-supervised separation of sounds from single-channel mixtures, Proc. Int. Conf. Indep. Component Analysis and Signal Separation, pp.414-421, 2007. ,
A non-negative approach to semi-supervised separation of speech from noise with the use of temporal dynamics, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011. ,
Adam: A method for stochastic optimization, International Conference on Learning Representations (ICLR), 2015. ,
Performance measurement in blind audio source separation, IEEE Transactions on Audio, Speech, and Language Processing, vol.14, issue.4, pp.1462-1469, 2006. ,
URL : https://hal.archives-ouvertes.fr/inria-00544230
Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs, Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, pp.749-752, 2001. ,
An algorithm for intelligibility prediction of time-frequency weighted noisy speech, IEEE Trans. Audio, Speech, Language Process, vol.19, issue.7, pp.2125-2136, 2011. ,
A deep generative model of speech complex spectrograms, Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, pp.905-909, 2019. ,