R. E. Kalman, A new approach to linear filtering and prediction problems, Transactions of the ASME-Journal of Basic Engineering, vol.82, pp.35-45, 1960.

H. E. Rauch, C. Striebel, and F. Tung, Maximum likelihood estimates of linear dynamic systems, AIAA Journal, vol.3, issue.8, pp.1445-1450, 1965.

R. Chen and J. S. Liu, Mixture Kalman filters, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.62, issue.3, pp.493-508, 2000.

M. S. Arulampalam, S. Maskell, N. Gordon, and T. Clapp, A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking, IEEE Transactions on Signal Processing, vol.50, issue.2, pp.174-188, 2002.

A. H. Jazwinski, Stochastic processes and filtering theory. Courier Corporation, 2007.

S. J. Julier and J. K. Uhlmann, Unscented filtering and nonlinear estimation, Proceedings of the IEEE, vol.92, issue.3, pp.401-422, 2004.

K. P. Murphy, Machine Learning: a Probabilistic Perspective, 2012.

K. Xiong, H. Zhang, and C. Chan, Performance evaluation of UKFbased nonlinear filtering, Automatica, vol.42, issue.2, pp.261-270, 2006.

N. Dalal and B. Triggs, Histograms of oriented gradients for human detection, IEEE Conference on Computer Vision and Pattern Recognition, vol.1, pp.886-893, 2005.
URL : https://hal.archives-ouvertes.fr/inria-00548512

B. Ahn, J. Park, and I. S. Kweon, Real-time head orientation from a monocular camera using deep neural network, Asian Conference on Computer Vision, pp.82-96, 2014.

S. S. Mukherjee and N. M. Robertson, Deep head pose: Gaze-direction estimation in multimodal video, IEEE Transactions on Multimedia, vol.17, issue.11, pp.2094-2107, 2015.

R. Ranjan, V. M. Patel, and R. Chellappa, Hyperface: A deep multitask learning framework for face detection, landmark localization, pose estimation, and gender recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.

S. Lathuilière, R. Juge, P. Mesejo, R. Munoz-salinas, and R. Horaud, Deep mixture of linear inverse regressions applied to head-pose estimation, IEEE Conference on Computer Vision and Pattern Recognition, 2017.

C. E. Rasmussen, Gaussian processes for machine learning, 2006.

A. J. Smola and B. Schölkopf, A tutorial on support vector regression, Statistics and computing, vol.14, issue.3, pp.199-222, 2004.

H. Abdi, Encyclopedia for research methods for the social sciences, pp.792-795, 2003.

A. Deleforge, F. Forbes, and R. Horaud, High-dimensional regression with Gaussian mixtures and partially-latent response variables, Statistics and Computing, vol.25, issue.5, pp.893-911, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01107604

V. Drouard, R. Horaud, A. Deleforge, S. Ba, and G. Evangelidis, Robust head-pose estimation based on partially-latent mixture of linear regressions, IEEE Transactions on Image Processing, vol.26, issue.3, pp.1428-1440, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01413406

C. Tu, F. Forbes, B. Lemasson, and N. Wang, Prediction with high dimensional regression via hierarchically structured Gaussian mixtures and latent variables, Journal of the Royal Statistical Society: Series C (Applied Statistics), vol.68, issue.5, pp.1485-1507, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02263144

E. Perthame, F. Forbes, and A. Deleforge, Inverse regression approach to robust non-linear high-to-low dimensional mapping, Journal of Multivariate Analysis, 2017.

A. Deleforge, R. Horaud, Y. Y. Schechner, and L. Girin, Co-localization of audio sources in images using binaural features and locally-linear regression, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.23, issue.4, pp.718-731, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01112834

X. Li, L. Girin, R. Horaud, and S. Gannot, Estimation of the directpath relative transfer function for supervised sound-source localization, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.24, issue.11, pp.2171-2186, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01349691

Z. Ghahramani and G. E. Hinton, Variational learning for switching state-space models, Neural computation, vol.12, issue.4, pp.831-864, 2000.

A. Doucet, N. Freitas, K. Murphy, and S. Russell, Rao-Blackwellised particle filtering for dynamic Bayesian networks, Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence, pp.176-183, 2000.

A. I. Rosti and M. J. Gales, Rao-blackwellised Gibbs sampling for switching linear dynamical systems, International Conference on Acoustics, Speech, and Signal Processing, vol.1, pp.809-812, 2004.

S. M. Oh, J. M. Rehg, T. Balch, and F. Dellaert, Learning and inferring motion patterns using parametric segmental switching linear dynamic systems, International Journal of Computer Vision, vol.77, issue.1-3, pp.103-124, 2008.

J. F. Kooij, G. Englebienne, and D. M. Gavrila, Mixture of switching linear dynamics to discover behavior patterns in object tracks, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.38, issue.2, pp.322-334, 2016.

D. M. Blei, A. Kucukelbir, and J. D. Mcauliffe, Variational inference: A review for statisticians, Journal of the American Statistical Association, vol.112, issue.518, pp.859-877, 2017.

Y. Bar-shalom and X. Li, Estimation and tracking: Principles, techniques, and software. Artech House, 1993.

Y. Bar-shalom and T. E. Fortmann, Tracking and Data Association, 1988.

K. P. Murphy, Dynamic Bayesian networks: representation, inference and learning, 2002.

X. Boyen and D. Koller, Tractable inference for complex stochastic processes, Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, pp.33-42, 1998.

W. Wu, M. J. Black, D. Mumford, Y. Gao, E. Bienenstock et al., Modeling and decoding motor cortical activity using a switching Kalman filter, IEEE Transactions on Biomedical Engineering, vol.51, issue.6, pp.933-942, 2004.

B. Massé, S. Ba, and R. Horaud, Tracking gaze and visual focus of attention of people involved in social interaction, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.40, issue.11, pp.2711-2724, 2018.

J. F. Kooij, F. Flohr, E. A. Pool, and D. M. Gavrila, Context-based path prediction for targets with switching dynamics, International Journal of Computer Vision, vol.127, issue.3, pp.239-262, 2019.

D. Barber, Expectation correction for smoothed inference in switching linear dynamical systems, Journal of Machine Learning Research, vol.7, pp.2515-2540, 2006.

B. Mesot and D. Barber, Switching linear dynamical systems for noise robust speech recognition, IEEE Transactions on Audio, Speech, and Language Processing, vol.15, issue.6, pp.1850-1858, 2007.

V. Pavlovic, B. J. Frey, and T. S. Huang, Variational learning in mixed-state dynamic graphical models, Proceedings of Uncertainty in Artificial Intelligence, pp.522-530, 1999.

L. J. Lee, H. Attias, and L. Deng, Variational inference and learning for segmental switching state space models of hidden speech dynamics, IEEE International Conference on Acoustics, Speech, and Signal Processing, vol.1, 2003.

L. J. Lee, H. Attias, L. Deng, and P. Fieguth, A multimodal variational approach to learning and inference in switching state space models, IEEE International Conference on Acoustics, Speech, and Signal Processing, pp.505-508, 2004.

L. Deng, Switching dynamic system models for speech articulation and acoustics, Mathematical Foundations of Speech and Language Processing, pp.115-133, 2004.

V. Pavlovic, J. M. Rehg, and J. Maccormick, Learning switching linear models of human motion, Proceedings of Neural Information Processing Systems, 2000.

C. Zhang, J. Bütepage, H. Kjellström, and S. Mandt, Advances in variational inference, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.41, issue.8, pp.2008-2026, 2019.

Z. Ma, A. E. Teschendorff, A. Leijon, Y. Qiao, H. Zhang et al., Variational Bayesian matrix factorization for bounded support data, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.37, issue.4, pp.876-889, 2014.

Z. Ma, P. K. Rana, J. Taghia, M. Flierl, and A. Leijon, Bayesian estimation of Dirichlet mixture model with variational inference, Pattern Recognition, vol.47, issue.9, pp.3143-3157, 2014.

J. Taghia, Z. Ma, and A. Leijon, Bayesian estimation of the von-Mises Fisher mixture model with variational inference, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.36, issue.9, pp.1701-1715, 2014.

J. Taghia and A. Leijon, Variational inference for Watson mixture model, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.38, issue.9, pp.1886-1900, 2015.

A. Deleforge, F. Forbes, S. Ba, and R. Horaud, Hyper-spectral image analysis with partially latent regression and spatial Markov dependencies, IEEE Journal of Selected Topics in Signal Processing, vol.9, issue.6, pp.1037-1048, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01136465

S. Ba, X. Alameda-pineda, A. Xompero, and R. Horaud, An on-line variational Bayesian model for multi-person tracking from cluttered scenes, Computer Vision and Image Understanding, vol.153, pp.64-76, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01349763

Z. Ma, J. Xie, Y. Lai, J. Taghia, J. Xue et al., Insights into multiple/single lower bound approximation for extended variational inference in non-Gaussian structured data modeling, IEEE Transactions on Neural Networks and Learning Systems, 2019.

M. Byeon, M. Lee, K. Kim, and J. Y. Choi, Variational inference for 3-D localization and tracking of multiple targets using multiple cameras, IEEE Transactions on Neural Networks and Learning Systems, 2019.

Y. Ban, X. Alameda-pineda, C. Evers, and R. Horaud, Tracking multiple audio sources with the von Mises distribution and variational EM, IEEE Signal Processing Letters, vol.26, issue.6, pp.798-802, 2019.
URL : https://hal.archives-ouvertes.fr/hal-01969050

Y. Ban, X. Alameda-pineda, L. Girin, and R. Horaud, Variational Bayesian inference for audio-visual tracking of multiple speakers, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019.
URL : https://hal.archives-ouvertes.fr/hal-01950866

Z. Ghahramani and M. I. Jordan, Factorial hidden markov models, Advances in Neural Information Processing Systems, 1996.

G. Fanelli, M. Dantone, J. Gall, A. Fossati, and L. Van-gool, Random forests for real time 3d face analysis, International Journal of Computer Vision, vol.101, issue.3, pp.437-458, 2013.

K. A. Mora, F. Monay, and J. Odobez, Eyediap: a database for the development and evaluation of gaze estimation algorithms from rgb and rgb-d cameras, ACM Symposium on Eye Tracking Research and Applications, pp.255-258, 2014.

D. B. Jayagopi, S. Sheikhi, D. Klotz, J. Wienke, J. Odobez et al., The vernissage corpus: A multimodal human-robot-interaction dataset, 2012.

A. Bhattacharyya, On a measure of divergence between two statistical population defined by their population distributions, Bulletin Calcutta Mathematical Society, vol.35, pp.99-109, 1943.

M. U?i?á?, V. Franc, and V. Hlavá?, Detector of facial landmarks learned by the structured output SVM, International Conference on Computer Vision Theory and Applications, 2012.

J. Cech, V. Franc, and J. Matas, A 3D approach to facial landmarks: Detection, refinement, and tracking, International Conference on Pattern Recognition, pp.2173-2178, 2014.

T. Baltru?aitis, P. Robinson, and L. Morency, Openface: an open source facial behavior analysis toolkit, IEEE Winter Conference on Applications of Computer Vision (WACV), 2016.

T. Baltrusaitis, P. Robinson, and L. Morency, Constrained local neural fields for robust facial landmark detection in the wild, Proceedings of the IEEE International Conference on Computer Vision Workshops, pp.354-361, 2013.

K. A. Mora and J. Odobez, Gaze estimation from multimodal kinect data, IEEE CVPRW, 2012.

V. Drouard, S. Ba, and R. Horaud, Switching linear inverse-regression model for tracking head pose, IEEE Winter Conference on Applications of Computer Vision, pp.1232-1240, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01430727