C. Philipos and . Loizou, Speech enhancement: theory and practice, 2007.

E. Vincent, M. G. Jafari, S. A. Abdallah, M. D. Plumbley, and M. E. Davies, Probabilistic Modeling Paradigms for Audio Source Separation, Machine Audition: Principles, Algorithms and Systems, Wenwu Wang, pp.162-185, 2010.
DOI : 10.4018/978-1-61520-919-4.ch007

URL : https://hal.archives-ouvertes.fr/inria-00544016

C. Févotte, N. Bertin, and J. Durrieu, Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis, Neural Computation, vol.14, issue.3, pp.793-830, 2009.
DOI : 10.1016/j.sigpro.2007.01.024

D. Wang and J. Chen, Supervised Speech Separation Based on Deep Learning: An Overview, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.26, issue.10, 2017.
DOI : 10.1109/TASLP.2018.2842159

A. A. Nugraha, A. Liutkus, and E. Vincent, Multichannel Audio Source Separation With Deep Neural Networks, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.24, issue.9, pp.1652-1664, 2016.
DOI : 10.1109/TASLP.2016.2580946

URL : https://hal.archives-ouvertes.fr/hal-01163369

P. Smaragdis and S. Venkataramani, A neural network alternative to non-negative audio models, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.86-90, 2017.
DOI : 10.1109/ICASSP.2017.7952123

Y. Bando, M. Mimura, K. Itoyama, K. Yoshii, and T. Kawahara, Statistical speech enhancement based on probabilistic integration of variational autoencoder and non-negative matrix factorization, Proc. IEEE Int, pp.716-720, 2018.

P. Diederik, M. Kingma, and . Welling, Auto-encoding variational Bayes, Proc. Int. Conf. Learning Representations (ICLR), 2014.

C. G. Greg, M. A. Wei, and . Tanner, A Monte Carlo implementation of the EM algorithm and the poor man's data augmentation algorithms, Journal of the American statistical Association, vol.85, issue.411, pp.699-704, 1990.

Y. Xu, J. Du, L. Dai, and C. Lee, A regression approach to speech enhancement based on deep neural networks, IEEE Trans. Audio, Speech, Language Process, vol.23, issue.1, pp.7-19, 2015.

A. Liutkus, R. Badeau, and G. Richard, Gaussian Processes for Underdetermined Source Separation, IEEE Transactions on Signal Processing, vol.59, issue.7, pp.3155-3167, 2011.
DOI : 10.1109/TSP.2011.2119315

URL : https://hal.archives-ouvertes.fr/hal-00643951

E. Ollila, J. Eriksson, and V. Koivunen, Complex Elliptically Symmetric Random Variables???Generation, Characterization, and Circularity Tests, IEEE Transactions on Signal Processing, vol.59, issue.1, pp.58-69, 2011.
DOI : 10.1109/TSP.2010.2083655

A. Liutkus and R. Badeau, Generalized Wiener filtering with fractional power spectrograms, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.266-270, 2015.
DOI : 10.1109/ICASSP.2015.7177973

URL : https://hal.archives-ouvertes.fr/hal-01110028

K. Yoshii, K. Itoyama, and M. Goto, Student's T nonnegative matrix factorization and positive semidefinite tensor factorization for single-channel audio source separation, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.51-55, 2016.
DOI : 10.1109/ICASSP.2016.7471635

P. Smaragdis, B. Raj, and M. Shashanka, Supervised and Semi-supervised Separation of Sounds from Single-Channel Mixtures, Proc. Int. Conf. Indep. Component Analysis and Signal Separation, pp.414-421, 2007.
DOI : 10.1007/978-3-540-74494-8_52

A. P. Dempster, N. M. Laird, and D. B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of the royal statistical society. Series B (Methodological), vol.39, issue.1, pp.1-38, 1977.

P. Christian, G. Robert, and . Casella, Monte Carlo Statistical Methods, 2005.

K. S. Chan and J. Ledolter, Monte Carlo EM Estimation for Time Series Models Involving Counts, Journal of the American Statistical Association, vol.75, issue.429, pp.242-252, 1995.
DOI : 10.1093/biomet/75.4.621

C. Févotte and J. Idier, Algorithms for Nonnegative Matrix Factorization with the ??-Divergence, Neural Computation, vol.11, issue.9, pp.2421-2456, 2011.
DOI : 10.1109/TASL.2009.2034186

R. David, K. Hunter, and . Lange, A tutorial on MM algorithms, The American Statistician, vol.58, issue.1, pp.30-37, 2004.

C. F. Wu, On the Convergence Properties of the EM Algorithm, The Annals of statistics, pp.95-103, 1983.
DOI : 10.1214/aos/1176346060

, Supporting document

J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett et al., TIMIT acoustic phonetic continuous speech corpus, Linguistic data consortium, 1993.

J. Thiemann, N. Ito, and E. Vincent, The Diverse Environments Multi-channel Acoustic Noise Database (DEMAND): A database of multichannel environmental noise recordings, Proc. Int. Cong. on Acoust, 2013.
DOI : 10.1121/1.4799597

URL : https://hal.archives-ouvertes.fr/hal-00796707

P. Diederik, J. Kingma, and . Ba, Adam: A method for stochastic optimization, 2014.

X. Glorot and Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, Proc. Int. Conf. Artif, pp.249-256, 2010.

E. Vincent, R. Gribonval, and C. Févotte, Performance measurement in blind audio source separation, IEEE Transactions on Audio, Speech and Language Processing, vol.14, issue.4, pp.1462-1469, 2006.
DOI : 10.1109/TSA.2005.858005

URL : https://hal.archives-ouvertes.fr/inria-00544230

A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra, Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221), pp.749-752, 2001.
DOI : 10.1109/ICASSP.2001.941023