, Microphone Arrays: Signal Processing Techniques and Applications, 2001.

M. Wölfel and J. Mcdonough, Distant Speech Recognition, 2009.

T. Virtanen, R. Singh, and B. Raj, Techniques for Noise Robustness in Automatic Speech Recognition, 2012.

E. Vincent, T. Virtanen, and S. Gannot, Audio Source Separation and Speech Enhancement, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01881431

L. Perotin, R. Serizel, E. Vincent, and A. Guérin, Multichannel speech separation with recurrent neural networks from high-order Ambisonics recordings, Proc. of ICASSP, pp.36-40, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01699759

M. A. Gerzon, Periphony: with-height sound reproduction, JAES, vol.21, issue.1, pp.2-10, 1973.

J. Herre, J. Hilpert, A. Kuntz, and J. Plogsties, MPEG-H 3D audioThe new standard for coding of immersive spatial audio, IEEE J. Sel. Topics Signal Process, vol.9, issue.5, pp.770-779, 2015.

V. Pulkki, Spatial sound reproduction with directional audio coding, JAES, vol.55, issue.6, pp.503-516, 2007.

D. P. Jarrett, E. A. Habets, and P. A. Naylor, 3D source localization in the spherical harmonic domain using a pseudointensity vector, Proc. of EUSIPCO, pp.442-446, 2010.

C. Evers, A. H. Moore, and P. A. Naylor, Multiple source localisation in the spherical harmonic domain, Proc. of IWAENC, pp.258-262, 2014.

T. E. Tuncer and B. Friedlander, Classical and modern direction-ofarrival estimation, 2009.

J. Dibiase, H. Silverman, and M. Brandstein, Robust localization in reverberant rooms, Microphone Arrays: Signal Processing Techniques and Applications, 2001.

C. Blandin, A. Ozerov, and E. Vincent, Multi-source TDOA estimation in reverberant audio using angular spectra and clustering, Signal Processing, vol.92, issue.8, pp.1950-1960, 2012.
URL : https://hal.archives-ouvertes.fr/inria-00576297

P. Pertilä, A. Brutti, P. Svaizer, and M. Omologo, Multichannel source activity detection, localization, and tracking, Audio Source Separation and Speech Enhancement, 2018.

C. Knapp and G. Carter, The generalized correlation method for estimation of time delay, IEEE Trans. Acoustics, Speech, Signal Process, vol.24, issue.4, pp.320-327, 1976.

M. S. Brandstein and H. F. Silverman, A robust method for speech signal time-delay estimation in reverberant rooms, Proc. of ICASSP, vol.1, pp.375-378, 1997.

R. Schmidt, Multiple emitter location and signal parameter estimation, IEEE Trans. Antennas Propag, vol.34, issue.3, pp.276-280, 1986.

R. Roy and T. Kailath, ESPRIT-estimation of signal parameters via rotational invariance techniques, IEEE Trans. Acoustics, Speech, Signal Process, vol.37, issue.7, pp.984-995, 1989.

O. Nadiri and B. Rafaeli, Localization of multiple speakers under high reverberation using a spherical microphone array and the direct-path dominance test, IEEE/ACM Trans. Audio, Speech and Lang. Proc, vol.22, issue.10, pp.1494-1505, 2014.

H. Sun, H. Teutsch, E. Mabande, and W. Kellermann, Robust localization of multiple sources in reverberant environments using EB-ESPRIT with spherical microphone arrays, Proc. of ICASSP, pp.117-120, 2011.

J. Merimaa and V. Pulkki, Spatial impulse response rendering I: Analysis and synthesis, JAES, vol.53, issue.12, pp.1115-1127, 2005.

S. Tervo, Direction estimation based on sound intensity vectors, Proc. EUSIPCO, pp.700-704, 2009.

S. Hafezi, A. H. Moore, and P. A. Naylor, Augmented intensity vectors for direction of arrival estimation in the spherical harmonic domain, IEEE/ACM Trans. Audio, Speech and Lang. Proc, vol.25, issue.10, 1956.

N. Ma, G. J. Brown, and T. May, Exploiting deep neural networks and head movements for binaural localisation of multiple speakers in reverberant conditions, Proc. of Interspeech, pp.3302-3306, 2015.

X. Xiao, A learning-based approach to direction of arrival estimation in noisy and reverberant environments, Proc. of ICASSP, pp.2814-2818, 2015.

R. Takeda and K. Komatani, Sound source localization based on deep neural networks with directional activate function exploiting phase information, Proc. of ICASSP, pp.405-409, 2016.

S. Chakrabarty and E. A. Habets, Broadband DOA estimation using convolutional neural networks trained with noise signals, Proc. of WASPAA, pp.136-140, 2017.

, Multi-speaker localization using convolutional neural network trained with noise, ML4Audio Worskhop at NIPS, 2017.

S. Adavanne, A. Politis, and T. Virtanen, Direction of arrival estimation for multiple sound sources using convolutional recurrent neural network, Proc. EUSIPCO, 2018.

D. Salvati, C. Drioli, and G. L. Foresti, Exploiting CNNs for improving acoustic source localization in noisy and reverberant conditions, IEEE Trans. Em. Topics Comput. Intell, vol.2, issue.2, pp.103-116, 2018.

W. He, P. Motlicek, and J. Odobez, Joint localization and classification of multiple sound sources using a multi-task neural network, Proc. Interspeech, pp.312-316, 2018.

S. Bach, A. Binder, G. Montavon, F. Klauschen, K. Müller et al., On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation, PLOS ONE, vol.10, issue.7, p.130140, 2015.

E. Thuillier, H. Gamper, and I. J. Tashev, Spatial audio feature discovery with convolutional neural networks, Proc. of ICASSP, pp.6797-6801, 2018.
DOI : 10.1109/icassp.2018.8462315

URL : https://research.aalto.fi/files/28749078/ELEC_et_al_Spatial_audio_feature_discovery_with_convolutional_neural_networks_ICASSP_2018.pdf

S. Lapuschkin, A. Binder, G. Montavon, K. Müller, and W. Samek, Analyzing classifiers: Fisher vectors and deep neural networks, Proc. CVPR, pp.2912-2920, 2016.
DOI : 10.1109/cvpr.2016.318

URL : http://arxiv.org/pdf/1512.00172

L. Perotin, R. Serizel, E. Vincent, and A. Guérin, CRNN-based joint azimuth and elevation localization with the Ambisonics intensity vector, Proc. of IWAENC, pp.241-245, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01840453

J. Daniel, Représentation de champs acoustiques, application à la transmission et à la reproduction de scènes sonores complexes dans un contexte multimédia, 2000.

V. Pulkki, S. Delikaris-manias, and A. Politis, Parametric time-frequency domain spatial audio, 2017.

F. Jacobsen, A note on instantaneous and time-averaged active and reactive sound intensity, J. of Sound and Vibration, vol.147, issue.3, pp.489-496, 1991.

M. Baqué, Analyse de scène sonore multi-capteurs, 2017.

S. Ioffe and C. Szegedy, Batch normalization: accelerating deep network training by reducing internal covariate shift, Proc. of ICML, pp.448-456, 2015.

G. Montavon, W. Samek, and K. Müller, Methods for interpreting and understanding deep neural networks, Digital Signal Processing, vol.73, pp.1-15, 2018.

J. M. Zurada, A. Malinowski, and I. Cloete, Sensitivity analysis for minimization of input data dimension for feedforward neural network, Proc. of ISCAS, vol.6, pp.447-450, 1994.

L. Arras, G. Montavon, K. Müller, and W. Samek, Explaining recurrent neural network predictions in sentiment analysis, Proc. of WASSA, pp.159-168, 2017.
DOI : 10.18653/v1/w17-5221

URL : https://doi.org/10.18653/v1/w17-5221

S. Kiti´ckiti´c and A. Guérin, TRAMP: Tracking by a Real-time AMbisonicbased Particle filter, LOCATA workshop at IWAENC, 2018.

J. Huang, N. Ohnishi, and N. Sugie, Sound localization in reverberant environment based on the model of the precedence effect, IEEE Trans. Instrum. Meas, vol.46, issue.4, pp.842-846, 1997.

C. Faller and J. Merimaa, Source localization in complex listening situations: selection of binaural cues based on interaural coherence, JASA, vol.116, issue.5, pp.3075-3089, 2004.

R. Y. Litovsky, H. S. Colburn, W. A. Yost, and S. J. Guzman, The precedence effect, JASA, vol.106, issue.4, pp.1633-1654, 1999.

J. B. Allen and D. A. Berkley, Image method for efficiently simulating small-room acoustics, JASA, vol.65, issue.4, pp.943-950, 1979.

E. A. Habets, Room impulse response generator, 2006.

M. Acoustics, EM32 Eigenmike microphone array release notes (v17. 0), Tech. Rep, 2013.

L. F. Lamel, J. Gauvain, and M. Eskénazi, BREF, a large vocabulary spoken corpus for French, Proc. of Eurospeech, pp.505-508, 1991.

T. Dozat, Incorporating Nesterov momentum into Adam, Univ. of Stanford, Tech. Rep, 2015.

H. W. Löllmann, C. Evers, A. Schmidt, H. Mellmann, H. Barfuss et al., The LOCATA challenge data corpus for acoustic source localization and tracking, Proc. of SAM, pp.410-414, 2018.

H. W. Kuhn, The Hungarian method for the assignment problem, Naval Research Logistics Quarterly, vol.2, issue.1-2, pp.83-97, 1955.

E. Vincent, S. Araki, and P. Bofill, The 2008 signal separation evaluation campaign: a community-based approach to large-scale evaluation, Proc. of ICA, pp.734-741, 2009.
URL : https://hal.archives-ouvertes.fr/inria-00544168