P. C. Loizou, Speech Enhancement: Theory and Practice, 2013.

E. Vincent, T. Virtanen, and S. Gannot, Audio Source Separation and Speech Enhancement, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01881431

F. Weninger, J. R. Hershey, J. L. Roux, and B. W. Schuller, Discriminatively trained recurrent neural networks for single-channel speech separation, IEEE Global Conference on Signal and Information Processing, pp.577-581, 2014.

J. R. Hershey, Z. Chen, J. L. Roux, and S. Watanabe, Deep clustering: discriminative embeddings for segmentation and separation, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.31-35, 2016.

X. Lu, Y. Tsao, S. Matsuda, and C. Hori, Speech enhancement based on deep denoising autoencoder, INTERSPEECH, pp.436-440, 2013.

A. A. Nugraha, A. Liutkus, and E. Vincent, Multichannel audio source separation with deep neural networks, IEEE/ACM Transactions on Audio Speech and Language Processing, vol.24, issue.9, pp.1652-1664, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01163369

A. Ozerov and C. Févotte, Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation, IEEE Transactions on Audio, Speech and Language Processing, vol.18, issue.3, pp.550-563, 2010.

S. Leglaive, L. Girin, and R. Horaud, A variance modeling framework based on variational autoencoders for speech enhancement, Proc. IEEE Int. Workshop on Machine Learning for Signal Processing, pp.1-6, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01832826

S. Leglaive, U. Simsekli, A. Liutkus, L. Girin, and R. Horaud, Speech enhancement with variational autoencoders and alphastable distributions, IEEE International Conference on Acoustics Speech and Signal Processing, pp.1-5, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02005106

Y. Bando, M. Mimura, K. Itoyama, K. Yoshii, and T. Kawahara, Statistical speech enhancement based on probabilistic integration of variational autoencoder and non-negative matrix factorization, IEEE International Conference on Acoustics, Speech and Signal Processing, pp.716-720, 2018.

S. Leglaive, L. Girin, and R. Horaud, Semi-supervised multichannel speech enhancement with variational autoencoders and non-negative matrix factorization, IEEE International Conference on Acoustics, Speech and Signal Processing, pp.101-105, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02005102

K. Sekiguchi, Y. Bando, K. Yoshii, and T. Kawahara, Bayesian Multichannel Speech Enhancement with a Deep Speech Prior, APSIPA, pp.1233-1239, 2018.

H. Kameoka, L. Li, S. Inoue, and S. Makino, Semi-blind source separation with multichannel variational autoencoder, 2018.

S. Seki, H. Kameoka, L. Li, T. Toda, and K. Takeda, Generalized multichannel variational autoencoder for underdetermined source separation, 2018.

L. Li, H. Kameoka, and S. Makino, Fast MVAE joint separation and classification of mixed sources based on multichannel variational autoencoder with auxiliary classifier, CoRR, 2018.

D. P. Kingma and M. Welling, Auto-encoding variational Bayes, ICLR, 2014.

G. Wei and M. Tanner, A Monte Carlo implementation of the EM algorithm and the poor man's data augmentation algorithms, Journal of the American Statistical Association, vol.85, issue.411, pp.699-704, 1990.

C. M. Bishop, Pattern Recognition and Machine Learning, 2006.

C. P. Robert and G. Casella, Monte Carlo Statistical Methods, 2005.

E. Vincent, M. G. Jafari, S. A. Abdallah, M. D. Plumbley, and M. E. Davies, Probabilistic modeling paradigms for audio source separation, Machine Audition: Principles, Algorithms and Systems, pp.162-185, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00544016

S. Ruder, An overview of gradient descent optimization algorithms, 2016.

M. Pariente, A. Deleforge, and E. Vincent, A statistically principled and computationally efficient approach to speech enhancement using variational autoencoders : Supporting document, Inria, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02089062

C. Févotte, N. Bertin, and J. Durrieu, Nonnegative matrix factorization with the Itakura-Saito divergence: with application to music analysis, Neural Computation, vol.21, issue.3, pp.793-830, 2009.

J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett et al., TIMIT acoustic phonetic continuous speech corpus, 1993.

J. Thiemann, N. Ito, and E. Vincent, The diverse environments multi-channel acoustic noise database (DEMAND): A database of multichannel environmental noise recordings, Proc. Int. Cong. on Acoust, p.3591, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00796707

D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, 2014.

E. Vincent, R. Gribonval, and C. Févotte, Performance measurement in blind audio source separation, Speech, and Language Processing, vol.14, pp.1462-1469, 2006.
URL : https://hal.archives-ouvertes.fr/inria-00544230

A. A. Nugraha, K. Sekiguchi, and K. Yoshii, A deep generative model of speech complex spectrograms, 2019.

P. Magron and T. Virtanen, Bayesian anisotropic gaussian model for audio source separation, IEEE International Conference on Acoustics, Speech and Signal Processing, pp.166-170, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01632081

A. Liutkus, C. Rohlfing, and A. Deleforge, Audio source separation with magnitude priors: The BEADS model, IEEE International Conference on Acoustics, Speech and Signal Processing, pp.56-60, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01713886