, Microphone Arrays: Signal Processing Techniques and Applications, 2001.

E. Vincent, T. Virtanen, and S. Gannot, Audio Source Separation and Speech Enhancement, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01881431

M. Wölfel and J. Mcdonough, Distant Speech Recognition, 2009.

T. Virtanen, R. Singh, and B. Raj, Techniques for Noise Robustness in Automatic Speech Recognition, 2012.

C. Knapp and G. Carter, The generalized correlation method for estimation of time delay, IEEE Trans. Acoustics, Speech, Signal Process, vol.24, issue.4, pp.320-327, 1976.

M. S. Brandstein and H. F. Silverman, A robust method for speech signal time-delay estimation in reverberant rooms, Proc. of ICASSP, vol.1, pp.375-378, 1997.

R. Schmidt, Multiple emitter location and signal parameter estimation, IEEE Trans. Antennas Propag, vol.34, issue.3, pp.276-280, 1986.

R. Roy and T. Kailath, ESPRIT-estimation of signal parameters via rotational invariance techniques, Speech, Sig. Proc, vol.37, pp.984-995, 1989.

J. Merimaa and V. Pulkki, Spatial impulse response rendering I: Analysis and synthesis, JAES, vol.53, issue.12, pp.1115-1127, 2005.

J. Benesty, J. Chen, and Y. Huang, Microphone Array Signal Processing, 2008.

L. Perotin, R. Serizel, E. Vincent, and A. Guerin, CRNNbased multiple DoA estimation using acoustic intensity features for Ambisonics recordings, IEEE JSTSP, pp.22-33, 2019.
URL : https://hal.archives-ouvertes.fr/hal-01839883

N. Roman, D. Wang, and G. J. Brown, A classification-based cocktail-party processor, Proc. of NIPS, pp.1425-1432, 2004.

K. W. Wilson and T. Darrell, Learning a precedence effectlike weighting function for the generalized cross-correlation framework, IEEE Trans. Audio, Speech, Lang, vol.14, issue.6, pp.2156-2164, 2006.

H. Kayser and J. Anemüller, A discriminative learning approach to probabilistic acoustic source localization, Proc. of IWAENC, pp.99-103, 2014.

T. May, S. Par, and A. Kohlrausch, A probabilistic model for robust localization based on a binaural auditory front-end, IEEE Trans. Audio, Speech, Lang, vol.19, issue.1, pp.1-13, 2011.

G. Hinton, L. Deng, D. Yu, and G. E. Dahl, Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups, IEEE Signal Processing Magazine, vol.29, issue.6, pp.82-97, 2012.

Y. Wang and D. Wang, Towards scaling up classificationbased speech separation, IEEE Trans. Audio, Speech, Lang, vol.21, issue.7, pp.1381-1390, 2013.

X. Xiao, A learning-based approach to direction of arrival estimation in noisy and reverberant environments, Proc. of ICASSP, pp.2814-2818, 2015.

F. Stöter, S. Chakrabarty, B. Edler, and E. A. Habets, Classification vs. regression in supervised learning for single channel speaker count estimation, Proc. of ICASSP, pp.436-440, 2018.

M. Everingham and A. Zisserman, Regression and classification approaches to eye localization in face images, Proc. of FGR, pp.441-446, 2006.

A. Défossez, N. Zeghidour, N. Usunier, L. Bottou, and F. Bach, SING: Symbol-to-instrument neural generator, Proc. of NIPS, pp.9041-9051, 2018.

A. V. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals et al., WaveNet: a generative model for raw audio, 2016.

S. Chakrabarty and E. A. Habets, Broadband DOA estimation using convolutional neural networks trained with noise signals, Proc. of WASPAA, pp.136-140, 2017.

N. Ma, T. May, and G. J. Brown, Exploiting deep neural networks and head movements for robust binaural localization of multiple sources in reverberant environments, IEEE/ACM Trans. Audio, Speech and Lang. Proc, vol.25, issue.12, pp.2444-2453, 2017.

L. Perotin, R. Serizel, E. Vincent, and A. Guérin, CRNNbased joint azimuth and elevation localization with the Ambisonics intensity vector, Proc. of IWAENC, pp.241-245, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01840453

W. He, P. Motlicek, and J. Odobez, Deep neural networks for multiple speaker detection and localization, Proc. of ICRA, pp.74-79, 2018.

S. Adavanne, A. Politis, J. Nikunen, and T. Virtanen, Sound event localization and detection of overlapping sources using convolutional recurrent neural networks, J-STSP, 2018.

M. A. Gerzon, Periphony: with-height sound reproduction, JAES, vol.21, issue.1, pp.2-10, 1973.

J. W. Gibbs, Elementary principles in statistical mechanics. Charles Scribner's Sons, 1902.

E. A. Habets, Room impulse response generator, 2006.

L. F. Lamel, J. Gauvain, and M. Eskénazi, BREF, a large vocabulary spoken corpus for French, Proc. of Eurospeech, pp.505-508, 1991.

E. Vincent, S. Araki, and P. Bofill, The 2008 signal separation evaluation campaign: a community-based approach to large-scale evaluation, Proc. of ICA, pp.734-741, 2009.
URL : https://hal.archives-ouvertes.fr/inria-00544168

P. Golik, P. Doetsch, and H. Ney, Cross-entropy vs squared error training: a theoretical and experimental comparison, Proc. Interspeech, vol.13, pp.1756-1760, 2013.