B. Massé, S. Ba, and R. Horaud, Tracking Gaze and Visual Focus of Attention of People Involved in Social Interaction, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.
DOI : 10.1109/TPAMI.2017.2782819

A. Cretual and F. Chaumette, Application of Motion-Based Visual Servoing to Target Tracking, The International Journal of Robotics Research, vol.20, issue.11, 2001.
DOI : 10.1016/0734-189X(86)90076-9

C. Gaskett, L. Fletcher, and A. Zelinsky, Reinforcement learning for visual servoing of a mobile robot, Australian Conference on Robotics and Automation, 2000.

G. Bustamante, P. Danés, T. Forgue, and A. Podlubne, Towards information-based feedback control for binaural active localization, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016.
DOI : 10.1109/ICASSP.2016.7472894

URL : https://hal.archives-ouvertes.fr/hal-01969304

A. Magassouba, N. Bertin, and F. Chaumette, Aural Servo: Sensor-Based Control From Robot Audition, IEEE Transactions on Robotics, vol.34, issue.3, 2018.
DOI : 10.1109/TRO.2018.2805310

URL : https://hal.archives-ouvertes.fr/hal-01694366

R. S. Sutton and A. G. Barto, Introduction to Reinforcement Learning, 1998.

A. Ghadirzadeh, J. Bütepage, A. Maki, D. Kragic, and M. Björkman, A sensorimotor reinforcement learning framework for physical Human-Robot Interaction, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp.2682-2688, 2016.
DOI : 10.1109/IROS.2016.7759417

N. Mitsunaga, C. Smith, H. Kanda, N. Ishiguro, and . Hagita, Robot behavior adaptation for human-robot interaction based on policy gradient reinforcement learning, JRSJ, 2006.

A. L. Thomaz, G. Hoffman, and C. Breazeal, Reinforcement Learning with Human Teachers: Understanding How People Want to Teach Robots, ROMAN 2006, The 15th IEEE International Symposium on Robot and Human Interactive Communication, pp.352-357, 2006.
DOI : 10.1109/ROMAN.2006.314459

F. Cruz, G. I. Parisi, J. Twiefel, and S. Wermter, Multi-modal integration of dynamic audiovisual patterns for an interactive reinforcement learning scenario, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp.759-766, 2016.
DOI : 10.1109/IROS.2016.7759137

M. Rothbucher, C. Denk, and K. Diepold, Robotic gaze control using reinforcement learning, 2012 IEEE International Workshop on Haptic Audio Visual Environments and Games (HAVE 2012) Proceedings, 2012.
DOI : 10.1109/HAVE.2012.6374444

A. H. Qureshi, Y. Nakamura, Y. Yoshikawa, and H. Ishiguro, Robot gains social intelligence through multimodal deep reinforcement learning, 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids), pp.745-751, 2016.
DOI : 10.1109/HUMANOIDS.2016.7803357

, Show, attend and interact: Perceivable human-robot social interaction through neural attention q-network, IEEE ICRA, 2017.

M. Vázquez, A. Steinfeld, and S. E. Hudson, Maintaining awareness of the focus of attention of a conversation: A robot-centric reinforcement learning approach, 2016 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), 2016.
DOI : 10.1109/ROMAN.2016.7745088

M. Bennewitz, F. Faber, D. Joho, M. Schreiber, and S. Behnke, Towards a humanoid museum guide robot that interacts with multiple persons, 5th IEEE-RAS International Conference on Humanoid Robots, 2005., pp.418-423, 2005.
DOI : 10.1109/ICHR.2005.1573603

Y. Ban, X. Alameda-pineda, F. Badeig, S. Ba, and R. Horaud, Tracking a varying number of people with a visually-controlled robotic head, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017.
DOI : 10.1109/IROS.2017.8206274

URL : https://hal.archives-ouvertes.fr/hal-01542987

S. Yun, SUMMARY, Robotica, vol.5, issue.11, pp.2122-2138, 2017.
DOI : 10.1016/j.patrec.2010.09.011

Z. Cao, T. Simon, S. Wei, and Y. Sheikh, Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
DOI : 10.1109/CVPR.2017.143

X. Li, L. Girin, R. Horaud, and S. Gannot, Multiple-Speaker Localization Based on Direct-Path Features and Likelihood Maximization With Spatial Sparsity Regularization, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.25, issue.10, 2017.
DOI : 10.1109/TASLP.2017.2740001

URL : https://hal.archives-ouvertes.fr/hal-01413417

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness et al., Human-level control through deep reinforcement learning, Nature, vol.101, issue.7540, 2015.
DOI : 10.1016/S0004-3702(98)00023-X

S. Hochreiter and J. Schmidhuber, Long Short-Term Memory, Neural Computation, vol.4, issue.8, 1997.
DOI : 10.1016/0893-6080(88)90007-X

S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, ICML, 2015.

I. Goodfellow, Y. Bengio, A. Courville, and D. Learning, , 2016.

I. Gebru, S. Ba, X. Li, and R. Horaud, Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.40, issue.5, 2017.
DOI : 10.1109/TPAMI.2017.2648793

URL : https://hal.archives-ouvertes.fr/hal-01413403

F. Badeig, Q. Pelorson, S. Arias, V. Drouard, I. Gebru et al., A Distributed Architecture for Interacting with NAO, Proceedings of the 2015 ACM on International Conference on Multimodal Interaction , ICMI '15, pp.385-386, 2015.
DOI : 10.1145/2818346.2823303

URL : https://hal.archives-ouvertes.fr/hal-01201716

X. Li, L. Girin, F. Badeig, and R. Horaud, Reverberant sound localization with a robot head based on direct-path relative transfer function, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2016.
DOI : 10.1109/IROS.2016.7759437

URL : https://hal.archives-ouvertes.fr/hal-01349771

D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, ICLR, 2014.