S. Makino, H. Sawada, and T. Lee, Blind Speech Separation, ser. Signals and Communication Technology, 2007.

M. Wölfel and J. Mcdonough, Distant Speech Recognition, 2009.

G. R. Naik and W. Wang, Blind Source Separation: Advances in Theory, Algorithms and Applications, ser. Signals and Communication Technology, 2014.
DOI : 10.1007/978-3-642-55016-4

E. Vincent, N. Bertin, R. Gribonval, and F. Bimbot, From Blind to Guided Audio Source Separation: How models and side information can improve the separation of sound, IEEE Signal Processing Magazine, vol.31, issue.3, pp.107-115, 2014.
DOI : 10.1109/MSP.2013.2297440

URL : https://hal.archives-ouvertes.fr/hal-00922378

L. Deng and D. Yu, Deep Learning: Methods and Applications, ser. Found. Trends Signal Process, pp.3-4, 2014.
DOI : 10.1561/2000000039

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.650.4684

G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed et al., Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups, IEEE Signal Processing Magazine, vol.29, issue.6, pp.82-97, 2012.
DOI : 10.1109/MSP.2012.2205597

F. Weninger, H. Erdogan, S. Watanabe, E. Vincent, J. Le-roux et al., Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR, Proc. Int'l. Conf. Latent Variable Analysis and Signal Separation, 2015.
DOI : 10.1007/978-3-319-22482-4_11

URL : https://hal.archives-ouvertes.fr/hal-01163493

J. Chen, Y. Wang, and D. Wang, A feature study for classificationbased speech separation at low signal-to-noise ratios, IEEE/ACM Trans. Audio, Speech, Lang. Process, vol.22, issue.12, 1993.

Y. Jiang, D. Wang, R. Liu, and Z. Feng, Binaural Classification for Reverberant Speech Segregation Using Deep Neural Networks, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.22, issue.12, pp.2112-2121, 2014.
DOI : 10.1109/TASLP.2014.2361023

S. Araki, T. Hayashi, M. Delcroix, M. Fujimoto, K. Takeda et al., Exploring multi-channel features for denoisingautoencoder-based speech enhancement, Proc. IEEE Int'l Conf. Acoust. Speech Signal Process. (ICASSP), pp.116-120, 2015.

Y. Tu, J. Du, Y. Xu, L. Dai, and C. Lee, Speech separation based on improved deep neural networks with dual outputs of speech features for both target and interfering speakers, The 9th International Symposium on Chinese Spoken Language Processing, pp.250-254, 2014.
DOI : 10.1109/ISCSLP.2014.6936615

P. Huang, M. Kim, M. Hasegawa-johnson, and P. Smaragdis, Singing-voice separation from monaural recordings using deep recurrent neural networks, Proc. Int'l. Soc. for Music Inf. Retrieval (ISMIR), pp.477-482, 2014.

S. Uhlich, F. Giron, and Y. Mitsufuji, Deep neural network based instrument extraction from music, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.2135-2139, 2015.
DOI : 10.1109/ICASSP.2015.7178348

Y. Wang and D. Wang, Towards Scaling Up Classification-Based Speech Separation, IEEE Transactions on Audio, Speech, and Language Processing, vol.21, issue.7, pp.1381-1390, 2013.
DOI : 10.1109/TASL.2013.2250961

A. Narayanan and D. Wang, Ideal ratio mask estimation using deep neural networks for robust speech recognition, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.7092-7096, 2013.
DOI : 10.1109/ICASSP.2013.6639038

F. Weninger, J. Le-roux, J. R. Hershey, and B. Schuller, Discriminatively trained recurrent neural networks for single-channel speech separation, 2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pp.577-581, 2014.
DOI : 10.1109/GlobalSIP.2014.7032183

A. Narayanan, D. W. Wang, and D. Wang, Improving robustness of deep neural network acoustic models via speech separation and joint adaptive training A deep neural network for time-domain signal reconstruction, Proc. IEEE Int'l Conf. Acoust. Speech Signal Process. (ICASSP), pp.92-101, 2015.

H. Erdogan, J. R. Hershey, S. Watanabe, and J. L. Roux, Phasesensitive and recognition-boosted speech separation using deep recurrent neural networks, Proc. IEEE Int'l Conf. Acoust. Speech Signal Process. (ICASSP), pp.708-712, 2015.

N. Q. Duong, E. Vincent, and R. Gribonval, Under-Determined Reverberant Audio Source Separation Using a Full-Rank Spatial Covariance Model, IEEE Transactions on Audio, Speech, and Language Processing, vol.18, issue.7, pp.1830-1840, 2010.
DOI : 10.1109/TASL.2010.2050716

URL : https://hal.archives-ouvertes.fr/inria-00435807

A. Ozerov, E. Vincent, and F. Bimbot, A General Flexible Framework for the Handling of Prior Information in Audio Source Separation, IEEE Transactions on Audio, Speech, and Language Processing, vol.20, issue.4, pp.1118-1133, 2012.
DOI : 10.1109/TASL.2011.2172425

URL : https://hal.archives-ouvertes.fr/inria-00536917

T. Gerber, M. Dutasta, L. Girin, and C. Févotte, Professionallyproduced music separation guided by covers, Proc. Int'l. Soc. for Music Inf. Retrieval (ISMIR), pp.85-90, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00807027

M. Togami and Y. Kawaguchi, Simultaneous Optimization of Acoustic Echo Reduction, Speech Dereverberation, and Noise Reduction against Mutual Interference, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.22, issue.11, pp.1612-1623, 2014.
DOI : 10.1109/TASLP.2014.2341918

A. Liutkus, D. Fitzgerald, Z. Rafii, B. Pardo, and L. Daudet, Kernel Additive Models for Source Separation, IEEE Transactions on Signal Processing, vol.62, issue.16, pp.4298-4310, 2014.
DOI : 10.1109/TSP.2014.2332434

URL : https://hal.archives-ouvertes.fr/hal-01011044

A. Liutkus, D. Fitzgerald, and Z. Rafii, Scalable audio separation with light Kernel Additive Modelling, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.76-80, 2015.
DOI : 10.1109/ICASSP.2015.7177935

URL : https://hal.archives-ouvertes.fr/hal-01114890

S. Sivasankaran, A. A. Nugraha, E. Vincent, J. A. Morales-cordovilla, S. Dalmia et al., Robust ASR using neural network based speech enhancement and feature simulation, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp.482-489, 2015.
DOI : 10.1109/ASRU.2015.7404834

URL : https://hal.archives-ouvertes.fr/hal-01204553

E. Vincent, M. G. Jafari, S. A. Abdallah, M. D. Plumbley, and M. E. Davies, Probabilistic Modeling Paradigms for Audio Source Separation, Machine Audition: Principles, Algorithms and Systems, pp.162-185, 2011.
DOI : 10.4018/978-1-61520-919-4.ch007

URL : https://hal.archives-ouvertes.fr/inria-00544016

N. Q. Duong, H. Tachibana, E. Vincent, N. Ono, R. Gribonval et al., Multichannel harmonic and percussive component separation by joint modeling of spatial and spectral continuity, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.205-208, 2011.
DOI : 10.1109/ICASSP.2011.5946376

URL : https://hal.archives-ouvertes.fr/inria-00557145

A. Liutkus and R. Badeau, Generalized Wiener filtering with fractional power spectrograms, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.266-270, 2015.
DOI : 10.1109/ICASSP.2015.7177973

URL : https://hal.archives-ouvertes.fr/hal-01110028

D. Liu, P. Smaragdis, and M. Kim, Experiments on deep learning for speech denoising, Proc. ISCA INTERSPEECH, pp.2685-2688, 2014.

Y. Xu, J. Du, L. Dai, and C. Lee, An Experimental Study on Speech Enhancement Based on Deep Neural Networks, IEEE Signal Processing Letters, vol.21, issue.1, pp.65-68, 2014.
DOI : 10.1109/LSP.2013.2291240

C. Févotte, N. Bertin, and J. Durrieu, Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis, Neural Computation, vol.14, issue.3, pp.793-830, 2009.
DOI : 10.1016/j.sigpro.2007.01.024

A. Lefèvre, F. Bach, and C. Févotte, Online algorithms for nonnegative matrix factorization with the Itakura-Saito divergence, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp.313-316, 2011.
DOI : 10.1109/ASPAA.2011.6082314

N. Bertin, C. Févotte, and R. Badeau, A tempering approach for Itakura-Saito non-negative matrix factorization. With application to music transcription, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.1545-1548, 2009.
DOI : 10.1109/ICASSP.2009.4959891

URL : https://hal.archives-ouvertes.fr/hal-00945283

C. Févotte and A. Ozerov, Notes on Nonnegative Tensor Factorization of the Spectrogram for Audio Source Separation: Statistical Insights and Towards Self-Clustering of the Spatial Cues, Proc. Int'l. Symp. on Comput. Music Modeling and Retrieval, 2010.
DOI : 10.1007/978-3-642-23126-1_8

A. Liutkus, D. Fitzgerald, and R. Badeau, Cauchy nonnegative matrix factorization, 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2015.
DOI : 10.1109/WASPAA.2015.7336900

URL : https://hal.archives-ouvertes.fr/hal-01170924

J. Barker, R. Marxer, E. Vincent, and S. Watanabe, The third ???CHiME??? speech separation and recognition challenge: Dataset, task and baselines, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp.504-511, 2015.
DOI : 10.1109/ASRU.2015.7404837

URL : https://hal.archives-ouvertes.fr/hal-01211376

J. Garofalo, D. Graff, D. Paul, and D. Pallett, CSR-I (WSJ0) Complete, Linguistic Data Consortium, 2007.

E. Vincent, R. Gribonval, and C. Févotte, Performance measurement in blind audio source separation, IEEE Transactions on Audio, Speech and Language Processing, vol.14, issue.4, pp.1462-1469, 2006.
DOI : 10.1109/TSA.2005.858005

URL : https://hal.archives-ouvertes.fr/inria-00544230

B. Loesch and B. Yang, Adaptive Segmentation and Separation of Determined Convolutive Mixtures under Dynamic Conditions, Proc. Int'l. Conf. Latent Variable Analysis and Signal Separation, pp.41-48, 2010.
DOI : 10.1007/978-3-642-15995-4_6

C. Blandin, A. Ozerov, and E. Vincent, Multi-source TDOA estimation in reverberant audio using angular spectra and clustering, Signal Processing, vol.92, issue.8, 1950.
DOI : 10.1016/j.sigpro.2011.09.032

URL : https://hal.archives-ouvertes.fr/inria-00576297

J. Mcdonough and K. Kumatani, Microphone Arrays, Techniques for Noise Robustness in Automatic Speech Recognition Chicester, West Sussex, 2012.
DOI : 10.1002/9781118392683.ch6

K. Kumatani, J. Mcdonough, and B. Raj, Microphone Array Processing for Distant Speech Recognition: From Close-Talking Microphones to Far-Field Sensors, IEEE Signal Processing Magazine, vol.29, issue.6, pp.127-140, 2012.
DOI : 10.1109/MSP.2012.2205285

X. Glorot, A. Bordes, and Y. Bengio, Deep sparse rectifier networks, Proc. Int'l. Conf. Artificial Intelligence and Statistics (AISTATS), pp.315-323, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00752497

A. A. Nugraha, K. Yamamoto, and S. Nakagawa, Single-channel dereverberation by feature mapping using cascade neural networks for robust distant speaker identification and speech recognition, EURASIP Journal on Audio, Speech, and Music Processing, vol.2014, issue.1, 2014.
DOI : 10.1121/1.3257548

X. Jaureguiberry, E. Vincent, and G. Richard, Fusion Methods for Speech Enhancement and Audio Source Separation, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.24, issue.7, pp.1266-1279, 2016.
DOI : 10.1109/TASLP.2016.2553441

URL : https://hal.archives-ouvertes.fr/hal-01120685

Y. Bengio, Practical Recommendations for Gradient-Based Training of Deep Architectures, Neural Networks: Tricks of the Trade, pp.437-478, 2012.
DOI : 10.1162/089976602317318938

P. Sprechmann, A. M. Bronstein, and G. Sapiro, Supervised nonnegative matrix factorization for audio source separation Applied and Numerical Harmonic Analysis, Excursions in Harmonic Analysis, pp.407-420, 2015.

K. He, X. Zhang, S. Ren, and J. Sun, Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, " arXiv e-prints, Feb, 2015.

Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, Greedy layerwise training of deep networks, Proc. Conf. on Neural Information Processing Systems (NIPS), pp.153-160, 2006.

M. D. Zeiler, ADADELTA: An adaptive learning rate method ArXiv e-prints, 2012.

Y. Salaün, E. Vincent, N. Bertin, N. Souvirà-a-labastie, X. Jaureguiberry et al., The Flexible Audio Source Separation Toolbox Version 2.0, IEEE Int'l Conf. Acoust. Speech Signal Process. (ICASSP), 2014.

T. Hori, Z. Chen, H. Erdogan, J. R. Hershey, J. Le-roux et al., The MERL/SRI system for the 3RD CHiME challenge using beamforming, robust feature extraction, and advanced speech recognition, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp.475-481, 2015.
DOI : 10.1109/ASRU.2015.7404833

M. J. Gales, Maximum likelihood linear transformations for HMM-based speech recognition, Computer Speech & Language, vol.12, issue.2, pp.75-98, 1998.
DOI : 10.1006/csla.1998.0043

K. Vesel´yvesel´y, A. Ghoshal, L. Burget, and D. Povey, Sequencediscriminative training of deep neural networks, Proc. ISCA INTER- SPEECH, pp.2345-2349, 2013.

R. Kneser and H. Ney, Improved backing-off for M-gram language modeling, 1995 International Conference on Acoustics, Speech, and Signal Processing, pp.181-184, 1995.
DOI : 10.1109/ICASSP.1995.479394

]. T. Mikolov, M. Karafiát, L. Burget, J. Cernock´ycernock´y, and S. Khudanpur, Recurrent neural network based language model, Proc. ISCA INTERSPEECH, pp.1045-1048, 2010.

T. Development and T. , Theano: A Python framework for fast computation of mathematical expressions, " arXiv e-prints, 2016.

D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek et al., The Kaldi speech recognition toolkit, Proc. IEEE Automat. Speech Recognition and Understanding Workshop (ASRU), 2011.