M. Wölfel and J. Mcdonough, Distant Speech Recognition, 2009.

L. Deng, Front-end, back-end, and hybrid techniques for noise-robust speech recognition, " in Robust Speech Recognition of Uncertain or Missing Data -Theory and Applications, pp.67-99, 2011.

J. Barker, R. Marxer, E. Vincent, and S. Watanabe, The third ???CHiME??? speech separation and recognition challenge: Dataset, task and baselines, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2015.
DOI : 10.1109/ASRU.2015.7404837

URL : https://hal.archives-ouvertes.fr/hal-01211376

K. Kumatani, J. Mcdonough, and B. Raj, Microphone Array Processing for Distant Speech Recognition: From Close-Talking Microphones to Far-Field Sensors, IEEE Signal Processing Magazine, vol.29, issue.6, pp.127-140, 2012.
DOI : 10.1109/MSP.2012.2205285

E. Vincent, N. Bertin, R. Gribonval, and F. Bimbot, From Blind to Guided Audio Source Separation: How models and side information can improve the separation of sound, IEEE Signal Processing Magazine, vol.31, issue.3, pp.107-115, 2014.
DOI : 10.1109/MSP.2013.2297440

URL : https://hal.archives-ouvertes.fr/hal-00922378

L. Deng and D. Yu, Deep Learning: Methods and Applications, Foundations and Trends?? in Signal Processing, vol.7, issue.3-4, pp.197-387, 2014.
DOI : 10.1561/2000000039

F. Weninger, H. Erdogan, S. Watanabe, E. Vincent, J. Le-roux et al., Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR, Proc. 12th Int'l Conf. on Latent Variable Analysis and Signal Separation, 2015.
DOI : 10.1007/978-3-319-22482-4_11

URL : https://hal.archives-ouvertes.fr/hal-01163493

Y. Tu, J. Du, Y. Xu, L. Dai, and C. Lee, Speech separation based on improved deep neural networks with dual outputs of speech features for both target and interfering speakers, The 9th International Symposium on Chinese Spoken Language Processing, pp.250-254, 2014.
DOI : 10.1109/ISCSLP.2014.6936615

P. Huang, M. Kim, M. Hasegawa-johnson, and P. Smaragdis, Deep learning for monaural speech separation, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.1562-1566, 2014.
DOI : 10.1109/ICASSP.2014.6853860

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.642.1937

F. Weninger, J. Le-roux, J. R. Hershey, and B. Schuller, Discriminatively trained recurrent neural networks for single-channel speech separation, 2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pp.577-581, 2014.
DOI : 10.1109/GlobalSIP.2014.7032183

H. Hirsch and D. Pearce, The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions, Proc. ISCA Tutorial and Research Workshop ASR2000 ? Automatic Speech Recognition: Challenges for the new Millenium, 2000.

E. Vincent, J. Barker, S. Watanabe, J. Le-roux, F. Nesta et al., The second ‘chime’ speech separation and recognition challenge: Datasets, tasks and baselines, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.126-130, 2013.
DOI : 10.1109/ICASSP.2013.6637622

L. Cristoforetti, M. Ravanelli, M. Omologo, A. Sosi, A. Abad et al., The DIRHA simulated corpus, Proc. Language Resources and Evaluation Conf. (LREC), pp.2629-2634, 2014.

G. W. Taylor, G. E. Hinton, and S. T. Roweis, Modeling human motion using binary latent variables, Proc. Conf. on Neural Information Processing Systems (NIPS), pp.1345-1352, 2006.

R. O. Duda, P. E. Hart, and D. G. Stork, Pattern classification, 2012.

R. A. Gopinath, Maximum likelihood modeling with Gaussian distributions for classification, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181), pp.661-664, 1998.
DOI : 10.1109/ICASSP.1998.675351

M. J. Gales, Maximum likelihood linear transformations for HMM-based speech recognition, Computer Speech & Language, vol.12, issue.2, pp.75-98, 1998.
DOI : 10.1006/csla.1998.0043

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.21.444

T. Mikolov, M. Karafiát, L. Burget, J. Cernock´ycernock´y, and S. Khudanpur, Recurrent neural network based language model, Proc. ISCA INTERSPEECH, Makuhari, Japan, pp.1045-1048, 2010.

G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed et al., Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups, IEEE Signal Processing Magazine, vol.29, issue.6, pp.82-97, 2012.
DOI : 10.1109/MSP.2012.2205597

N. Q. Duong, E. Vincent, and R. Gribonval, Under-Determined Reverberant Audio Source Separation Using a Full-Rank Spatial Covariance Model, IEEE Transactions on Audio, Speech, and Language Processing, vol.18, issue.7, pp.1830-1840, 2010.
DOI : 10.1109/TASL.2010.2050716

URL : https://hal.archives-ouvertes.fr/inria-00435807

A. Liutkus, D. Fitzgerald, Z. Rafii, B. Pardo, and L. Daudet, Kernel Additive Models for Source Separation, IEEE Transactions on Signal Processing, vol.62, issue.16, pp.4298-4310, 2014.
DOI : 10.1109/TSP.2014.2332434

URL : https://hal.archives-ouvertes.fr/hal-01011044

B. Loesch and B. Yang, Adaptive Segmentation and Separation of Determined Convolutive Mixtures under Dynamic Conditions, Proc. 9th Int'l Conf. on Latent Variable Analysis and Signal Separation, pp.41-48, 2010.
DOI : 10.1007/978-3-642-15995-4_6

C. Blandin, A. Ozerov, and E. Vincent, Multi-source TDOA estimation in reverberant audio using angular spectra and clustering, Signal Processing, vol.92, issue.8, 1950.
DOI : 10.1016/j.sigpro.2011.09.032

URL : https://hal.archives-ouvertes.fr/inria-00576297

X. Glorot, A. Bordes, and Y. Bengio, Deep sparse rectifier networks, Proc. Int'l. Conf. Artificial Intelligence and Statistics (AISTATS), pp.315-323, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00752497

A. A. Nugraha, K. Yamamoto, and S. Nakagawa, Single-channel dereverberation by feature mapping using cascade neural networks for robust distant speaker identification and speech recognition, EURASIP Journal on Audio, Speech, and Music Processing, vol.2014, issue.1, 2014.
DOI : 10.1121/1.3257548

S. Uhlich, F. Giron, and Y. Mitsufuji, Deep neural network based instrument extraction from music, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.2135-2139, 2015.
DOI : 10.1109/ICASSP.2015.7178348

X. Jaureguiberry, E. Vincent, and G. Richard, Fusion methods for audio source separation Available: https://hal.archives-ouvertes, p.1120685, 2014.

Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, Greedy layer-wise training of deep networks, Proc. Conf. on Neural Information Processing Systems (NIPS), pp.153-160, 2006.

S. Duffner and C. Garcia, An Online Backpropagation Algorithm with Validation Error-Based Adaptive Learning Rate, Proc. Int'l. Conf. Artificial Neural Networks (ICANN), pp.249-258, 2007.
DOI : 10.1007/978-3-540-74690-4_26

I. Sutskever, J. Martens, G. E. Dahl, and G. E. Hinton, On the importance of initialization and momentum in deep learning, Proc. Int'l. Conf. Machine Learning (ICML), pp.1139-1147, 2013.

]. A. Liutkus, D. Fitzgerald, and Z. Rafii, Scalable audio separation with light Kernel Additive Modelling, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.76-80, 2015.
DOI : 10.1109/ICASSP.2015.7177935

URL : https://hal.archives-ouvertes.fr/hal-01114890

A. Ozerov, E. Vincent, and F. Bimbot, A General Flexible Framework for the Handling of Prior Information in Audio Source Separation, IEEE Transactions on Audio, Speech, and Language Processing, vol.20, issue.4, pp.1118-1133, 2012.
DOI : 10.1109/TASL.2011.2172425

URL : https://hal.archives-ouvertes.fr/inria-00536917

Z. Ling, S. Kang, H. Zen, A. Senior, M. Schuster et al., Deep Learning for Acoustic Modeling in Parametric Speech Generation: A systematic review of existing techniques and future trends, IEEE Signal Processing Magazine, vol.32, issue.3, pp.35-52, 2015.
DOI : 10.1109/MSP.2014.2359987

T. Nakashika, T. Takiguchi, and Y. Ariki, Voice conversion in time-invariant speaker-independent space, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.7889-7893, 2014.
DOI : 10.1109/ICASSP.2014.6855136

A. Fischer and C. Igel, An Introduction to Restricted Boltzmann Machines, Progress in Pattern Recognition , Image Analysis, Computer Vision, and Applications, pp.14-36, 2012.
DOI : 10.1007/978-3-642-33275-3_2

J. Garofalo, D. Graff, D. Paul, and D. Pallett, CSR-I (WSJ0) Complete, Linguistic Data Consortium, 2007.

K. He, X. Zhang, S. Ren, and J. Sun, Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification, 2015 IEEE International Conference on Computer Vision (ICCV), 2015.
DOI : 10.1109/ICCV.2015.123

S. P. Rath, D. Povey, and K. Vesel´yvesel´y, Improved feature processing for deep neural networks, Proc. ISCA IN- TERSPEECH, pp.109-113, 2013.

K. Vesel´yvesel´y, A. Ghoshal, L. Burget, and D. Povey, Sequence-discriminative training of deep neural networks, Proc. ISCA INTERSPEECH, pp.2345-2349, 2013.