H. Admoni and B. Scassellati, Social eye gaze in human-robot interaction: a review, Journal of Human-Robot Interaction, vol.6, issue.1, pp.25-63, 2017.

B. Ahn, J. Park, and I. S. Kweon, Real-time head orientation from a monocular camera using deep neural network, Asian Conference on Computer Vision, pp.82-96, 2014.

O. Akhtiamov and V. Palkov, Gaze, prosody and semantics: Relevance of various multimodal signals to addressee detection in human-human-computer conversations, International Conference on Speech and Computer, pp.1-10, 2018.

O. Akhtiamov, M. Sidorov, A. A. Karpov, and W. Minker, Speech and text analysis for multimodal addressee detection in human-human-computer interaction, INTERSPEECH, pp.2521-2525, 2017.

S. Andrist, D. Bohus, B. Mutlu, and D. Schlangen, Turn-taking and coordination in human-machine interaction, AI Magazine, vol.37, issue.4, pp.5-6, 2016.

N. Baba, H. H. Huang, and Y. I. Nakano, Addressee identification for human-humanagent multiparty conversations in different proxemics, Proceedings of the 4th Workshop on Eye Gaze in Intelligent Human Machine Interaction, vol.6, 2012.

I. Bakx, K. Van-turnhout, and J. M. Terken, Facial orientation during multi-party interaction with information kiosks, p.INTERACT, 2003.

F. Bentley, C. Luvogt, M. Silverman, R. Wirasinghe, B. White et al., Understanding the long-term use of smart speaker assistants, Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol.2, p.91, 2018.

D. Bohus and E. Horvitz, Multiparty turn taking in situated dialog: Study, lessons, and directions, Proceedings of the SIGDIAL 2011 Conference, pp.98-109, 2011.

G. Borghi, M. Fabbri, R. Vezzani, S. Calderara, and R. Cucchiara, Facefrom-depth for head pose estimation on depth images, 2017.

D. Silva, M. P. Courboulay, V. Prigent, A. Estraillier, and P. , Real-time face tracking for attention aware adaptive games, International Conference on Computer Vision Systems, pp.99-108, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00281256

J. G. De and M. J. Kautz, Dynamic facial analysis: From bayesian filtering to recurrent neural network, 2017.

D. F. Dementhon and L. S. Davis, Model-based object pose in 25 lines of code, International journal of computer vision, vol.15, issue.1-2, pp.123-141, 1995.

R. R. Divekar, J. Drozdal, Y. Zhou, Z. Song, D. Allen et al., Interaction challenges in ai equipped environments built to teach foreign languages through dialogue and task-completion, Proceedings of the 2018 Designing Interactive Systems Conference, pp.597-609, 2018.

R. R. Divekar, X. Mou, L. Chen, M. G. De-bayser, M. A. Guerra et al., Embodied conversational AI agents in a multi-modal multi-agent competitive dialogue, 2019.

R. G. Farrell, J. Lenchner, J. O. Kephjart, A. M. Webb, M. J. Muller et al., Symbiotic cognitive computing. AI Magazine, vol.37, issue.3, pp.81-93, 2016.

M. Frampton, R. Fernández, P. Ehlen, M. Christoudias, T. Darrell et al., Who is you?: combining linguistic and gaze features to resolve secondperson references in dialogue, Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp.273-281, 2009.

A. Gravano and J. Hirschberg, Turn-yielding cues in task-oriented dialogue, Proceedings of the SIGDIAL 2009 Conference: The 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pp.253-261, 2009.

E. Gu and N. I. Badler, Visual attention and eye gaze during multiparty conversations with distractions, Lecture Notes in Computer Science, vol.4133, issue.4133, p.3029743, 2006.

M. Katzenmaier, Identifying the addressee in human-human-robot interactions based on head pose and speech, 2004.

A. Kendon, Some functions of gaze-direction in social interaction, Acta psychologica, vol.26, pp.22-63, 1967.

J. O. Kephart, V. C. Dibia, J. Ellis, B. Srivastava, K. Talamadupula et al., A cognitive assistant for visualizing and analyzing exoplanets, Proc. AAAI 2018, 2018.

L. Minh, T. Shimizu, N. Miyazaki, T. Shinoda, and K. , Deep Learning Based Multi-modal Addressee Recognition in Visual Scenes with Utterances, pp.1546-1553, 2018.

G. S. Lin and T. S. Tsai, A face tracking method using feature point tracking, Information Security and Intelligence Control (ISIC), 2012 International Conference on, pp.210-213, 2012.

B. Mutlu, T. Kanda, J. Forlizzi, J. Hodgins, and H. Ishiguro, Conversational gaze mechanisms for humanlike robots, ACM Transactions on Interactive Intelligent Systems, vol.1, issue.2, pp.1-33, 2012.

Y. I. Nakano, N. Baba, H. H. Huang, and Y. Hayashi, Implementation and evaluation of a multimodal addressee identification mechanism for multiparty conversation systems, Proceedings of the 15th ACM on International conference on multimodal interaction, pp.35-42, 2013.

A. Norouzian, B. Mazoure, D. Connolly, and D. Willett, Exploring attention mechanism for acoustic-based classification of speech utterances into systemdirected and non-system-directed, 2019.

N. M. Radziwill and M. C. Benton, Evaluating quality of chatbots and intelligent conversational agents, 2017.

S. Ranganatha and Y. Gowramma, An integrated robust approach for fast face tracking in noisy real-world videos with visual constraints, Advances in Computing, Communications and Informatics (ICACCI, pp.772-776, 2017.

S. Ravuri and A. Stolcke, Recurrent neural network and LSTM models for lexical utterance classification, Sixteenth Annual Conference of the International Speech Communication Association, 2015.

K. Ruhland, C. E. Peters, S. Andrist, J. B. Badler, N. I. Badler et al., A review of eye gaze in virtual agents, social robotics and hci: Behaviour generation, user interaction and perception, Computer graphics forum, vol.34, p.85, 2015.

A. Sciuto, A. Saini, J. Forlizzi, and J. I. Hong, Hey alexa, what's up?: A mixedmethods studies of in-home conversational agent usage, Proceedings of the 2018 on Designing Interactive Systems Conference, pp.857-868, 2018.

S. Sheikhi and J. M. Odobez, Combining dynamic head pose-gaze mapping with the robot conversational state for attention recognition in human-robot interactions, Pattern Recognition Letters, vol.66, pp.81-90, 2015.

E. Shriberg, A. Stolcke, and S. V. Ravuri, Addressee detection for dialog systems using temporal and spectral dimensions of speaking style, pp.2559-2563, 2013.

R. Stiefelhagen and J. Zhu, Head orientation and gaze direction in meetings, CHI'02 Extended Abstracts on Human Factors in Computing Systems, pp.858-859, 2002.

T. Tsai, A. Stolcke, and M. Slaney, A study of multimodal addressee detection in human-human-computer interaction, IEEE Transactions on Multimedia, vol.17, issue.9, pp.1550-1561, 2015.

K. Van-turnhout, J. Terken, I. Bakx, and B. Eggen, Identifying the intended addressee in mixed human-human and human-computer interaction from nonverbal features, Proceedings of the 7th international conference on Multimodal interfaces, pp.175-182, 2005.

M. Venturelli, G. Borghi, R. Vezzani, and R. Cucchiara, From depth data to head pose estimation: a siamese approach, 2017.

K. Wang and Q. Ji, Real time eye gaze tracking with 3d deformable eye-face model, Proc. IEEE CVPR, pp.1003-1011, 2017.

Y. Wu, C. Gou, and Q. Ji, Simultaneous facial landmark detection, pose and deformation estimation under facial occlusion, 2017.

Y. Wu and Q. Ji, Facial landmark detection: A literature survey, International Journal of Computer Vision, pp.1-28, 2017.

T. L. Xu, H. Zhang, and C. Yu, See you see me: The role of eye contact in multimodal human-robot interaction, ACM Transactions on Interactive Intelligent Systems (TiiS), vol.6, issue.1, 2016.

, ZDNet: How alexa developers are using visual elements for echo show, 2018.

R. Zhao, K. Wang, R. Divekar, R. Rouhani, H. Su et al., An immersive system with multi-modal human-computer interaction, 13th IEEE Int'l Conference on Automatic Face & Gesture Recognition (FG 2018), pp.517-524, 2018.