G. López, L. Quesada, and L. A. Guerrero, Alexa vs. Siri vs. Cortana vs. Google Assistant: a comparison of speech-based natural user interfaces, International Conference on Applied Human Factors and Ergonomics, pp.241-250, 2017.

V. Kepuska and G. Bohouta, Next-generation of virtual personal assistants (Microsoft Cortana, Apple Siri, Amazon Alexa and Google Home), IEEE CCWC, pp.99-103, 2018.

X. Lei, G. Tu, A. X. Liu, C. Li, and T. Xie, The insecurity of home digital voice assistants -Amazon Alexa as a case study, 2017.

H. Chung, M. Iorga, J. Voas, and S. Lee, Alexa, can i trust you, Computer, vol.50, issue.9, pp.100-104, 2017.

D. A. Reynolds, Speaker identification and verification using Gaussian mixture speaker models, Speech Communication, vol.17, issue.1-2, pp.91-108, 1995.

Y. Gu, X. Li, S. Chen, J. Zhang, and I. Marsic, Speech intention classification with multimodal deep learning, Canadian Conference on Artificial Intelligence, pp.260-271, 2017.

N. Hellbernd and D. Sammler, Prosody conveys speaker's intentions: Acoustic cues for speech act perception, Journal of Memory and Language, vol.88, pp.70-86, 2016.

T. Ballmer and W. Brennstuhl, Speech act classification: A study in the lexical analysis of English speech activity verbs, vol.8, 2013.

A. Stolcke, E. Shriberg, R. Bates, N. Coccaro, D. Jurafsky et al., Dialog act modeling for conversational speech, AAAI Spring Symposium on Applying Machine Learning to Discourse Processing, pp.98-105, 1998.

Y. Zeng, Z. Wu, T. Falk, and W. Chan, Robust GMM based gender classification using pitch and RASTA-PLP parameters of speech, International Conference on Machine Learning and Cybernetics, pp.3376-3379, 2006.

M. Kotti and C. Kotropoulos, Gender classification in two emotional speech databases, ICPR, pp.1-4, 2008.

M. E. Ayadi, M. S. Kamel, and F. Karray, Survey on speech emotion recognition: Features, classification schemes, and databases, Pattern Recognition, vol.44, issue.3, pp.572-587, 2011.

D. Ververidis and C. Kotropoulos, Automatic speech classification to five emotional states based on gender information, pp.341-344, 2004.

O. Kwon, K. Chan, J. Hao, and T. Lee, Emotion recognition by speech signals, EuroSpeech, 2003.

A. A. Dibazar, S. Narayanan, and T. W. Berger, Feature analysis for automatic detection of pathological speech, 2nd Joint EMBS-BMES Conference, vol.1, pp.182-183, 2002.

K. Umapathy and S. Krishnan, Feature analysis of pathological speech signals using local discriminant bases technique, Medical and Biological Engineering and Computing, vol.43, issue.4, pp.457-464, 2005.

B. Schuller, S. Steidl, A. Batliner, A. Vinciarelli, K. Scherer et al., The INTERSPEECH 2013 computational paralinguistics challenge: social signals, pp.148-152, 2013.

B. Schuller and A. Batliner, Computational paralinguistics: emotion, affect and personality in speech and language processing, 2013.

B. Schuller, S. Steidl, A. Batliner, E. Nöth, A. Vinciarelli et al., A survey on perceived speaker traits: Personality, likability, pathology, and the first challenge, Computer Speech & Language, vol.29, issue.1, pp.100-131, 2015.

K. Sekiyama, Cultural and linguistic factors in audiovisual speech processing: The McGurk effect in Chinese subjects, Perception & Psychophysics, vol.59, issue.1, pp.73-80, 1997.

A. Vinciarelli, M. Pantic, and H. Bourlard, Social signal processing: Survey of an emerging domain, Image and Vision Computing, vol.27, issue.12, pp.1743-1759, 2009.

M. A. Pathak, Privacy-preserving machine learning for speech processing, 2012.

C. Glackin, G. Chollet, N. Dugan, N. Cannings, J. Wall et al., Privacy preserving encrypted phonetic search of speech data, IEEE ICASSP, pp.6414-6418, 2017.

D. Snyder, D. Garcia-romero, G. Sell, D. Povey, and S. Khudanpur, X-vectors: Robust DNN embeddings for speaker recognition, IEEE ICASSP, pp.5329-5333, 2018.

S. Watanabe, T. Hori, S. Karita, T. Hayashi, J. Nishitoba et al., Espnet: End-to-end speech processing toolkit, pp.2207-2211, 2018.

C. Feutry, P. Piantanida, Y. Bengio, and P. Duhamel, Learning anonymized representations with adversarial neural networks, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01742447

D. Serdyuk, K. Audhkhasi, P. Brakel, B. Ramabhadran, S. Thomas et al., Invariant representations for noisy speech recognition, 2016.

T. Tsuchiya, N. Tawara, T. Ogawa, and T. Kobayashi, Speaker invariant feature extraction for zero-resource languages with adversarial learning, IEEE ICASSP, pp.2381-2385, 2018.

Z. Meng, J. Li, Z. Chen, Y. Zhao, V. Mazalov et al., Speaker-invariant training via adversarial learning, IEEE ICASSP, pp.5969-5973, 2018.

Y. Adi, N. Zeghidour, R. Collobert, N. Usunier, V. Liptchinsky et al., To reverse the gradient or not: An empirical comparison of adversarial and multi-task learning in speech recognition, 2018.

V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, Librispeech: an ASR corpus based on public domain audio books, IEEE ICASSP, pp.5206-5210, 2015.

S. Watanabe, T. Hori, S. Kim, J. R. Hershey, and T. Hayashi, Hybrid CTC/attention architecture for end-to-end speech recognition, IEEE Journal of Selected Topics in Signal Processing, vol.11, issue.8, pp.1240-1253, 2017.

Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle et al., Domainadversarial training of neural networks, JMLR, vol.17, issue.1, pp.2096-2030, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01624607

T. Ko, V. Peddinti, D. Povey, M. L. Seltzer, and S. Khudanpur, A study on data augmentation of reverberant speech for robust speech recognition, IEEE, pp.5220-5224, 2017.

D. Snyder, G. Chen, and D. Povey, MUSAN: A music, speech, and noise corpus, 2015.

D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek et al., The Kaldi speech recognition toolkit, Tech. Rep, 2011.

J. K. Chorowski, D. Bahdanau, D. Serdyuk, K. Cho, and Y. Bengio, Attention-based models for speech recognition, NIPS, pp.577-585, 2015.

N. Zeghidour, Q. Xu, V. Liptchinsky, N. Usunier, G. Synnaeve et al., Fully convolutional speech recognition, 2018.