T. Kinnunen and H. Li, An overview of text-independent speaker recognition: From features to supervectors, Speech Communication, vol.52, issue.1, pp.12-40, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00587602

J. H. Hansen and T. Hasan, Speaker recognition by machines and humans: A tutorial review, IEEE Signal Processing Magazine, vol.32, issue.6, pp.74-99, 2015.

I. , Information technology-biometric presentation attack detection. International Organization for Standardization, 2016.

T. Kinnunen, M. Sahidullah, I. Kukanov, H. Delgado, M. Todisco et al., Utterance verification for text-dependent speaker recognition: A comparative assessment using the reddots corpus, Proc. Interspeech, pp.430-434, 2016.

W. Shang and M. Stevenson, Score normalization in playback attack detection, Proc. ICASSP, pp.1678-1681, 2010.

Z. Wu, N. Evans, T. Kinnunen, J. Yamagishi, F. Alegre et al., Spoofing and countermeasures for speaker verification: A survey, Speech Communication, vol.66, issue.0, pp.130-153, 2015.

P. Korshunov, S. Marcel, H. Muckenhirn, A. R. Gonçalves, A. G. Mello et al., Overview of BTAS 2016 speaker anti-spoofing competition, IEEE 8th International Conference on Biometrics Theory, Applications and Systems (BTAS), pp.1-6, 2016.

N. Evans, T. Kinnunen, J. Yamagishi, Z. Wu, F. Alegre et al., Speaker recognition anti-spoofing, Handbook of biometric antispoofing, 2014.

S. Marcel, S. Z. Li, and M. Nixon, Handbook of biometric anti-spoofing: trusted biometrics under spoofing attacks, 2014.

M. Farrús-cabeceran, M. Wagner, D. Erro, and H. Pericás, Automatic speaker recognition as a measurement of voice imitation and conversion, The Intenational Journal of Speech. Language and the Law, vol.1, issue.17, pp.119-142, 2010.

P. Perrot, G. Aversano, and G. Chollet, Voice disguise and automatic detection: review and perspectives. Progress in nonlinear speech processing, pp.101-117, 2007.

E. Zetterholm, Detection of speaker characteristics using voice imitation, Speaker Classification II, pp.192-205, 2007.

Y. W. Lau, M. Wagner, and D. Tran, Vulnerability of speaker verification to voice mimicking, Intelligent Multimedia, Video and Speech Processing, pp.145-148, 2004.

Y. W. Lau, D. Tran, and M. Wagner, Testing voice mimicry with the YOHO speaker verification corpus, International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, pp.15-21, 2005.

J. Mariéthoz and S. Bengio, Can a professional imitator fool a GMM-based speaker verification system?, IDIAP, 2005.

S. Panjwani and A. Prakash, Crowdsourcing attacks on biometric systems, Symposium On Usable Privacy and Security (SOUPS 2014), pp.257-269, 2014.

R. G. Hautamäki, T. Kinnunen, V. Hautamäki, and A. Laukkanen, Automatic versus human speaker verification: The case of voice mimicry, Speech Communication, vol.72, pp.13-31, 2015.

S. K. Ergunay, E. Khoury, A. Lazaridis, and S. Marcel, On the vulnerability of speaker verification to realistic voice spoofing, IEEE International Conference on Biometrics: Theory, Applications and Systems, pp.1-8, 2015.

J. Lindberg and M. Blomberg, Vulnerability in speaker verification-a study of technical impostor techniques, Proceedings of the European Conference on Speech Communication and Technology, vol.3, pp.1211-1214, 1999.

J. Villalba and E. Lleida, Speaker verification performance degradation against spoofing and tampering attacks, FALA 10 workshop, pp.131-134, 2010.

Z. F. Wang, G. Wei, and Q. H. He, Channel pattern noise based playback attack detection algorithm for speaker recognition, 2011 International Conference on Machine Learning and Cybernetics, vol.4, pp.1708-1713, 2011.
DOI : 10.1109/icmlc.2011.6016982

J. Villalba and E. Lleida, Preventing replay attacks on speaker verification systems, IEEE International Carnahan Conference on, pp.1-8, 2011.
DOI : 10.1109/ccst.2011.6095943

J. Ga?ka, M. Grzywacz, and R. Samborski, Playback attack detection for text-dependent speaker verification over telephone channels, Speech Communication, vol.67, pp.143-153, 2015.

P. Taylor, Text-to-Speech Synthesis, 2009.

D. H. Klatt, Software for a cascade/parallel formant synthesizer, Journal of the Acoustical Society of America, vol.67, pp.971-995, 1980.
DOI : 10.1121/1.383940

E. Moulines and F. Charpentier, Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones, Speech Commun, vol.9, pp.453-467, 1990.
DOI : 10.1016/0167-6393(90)90021-z

A. Hunt and A. W. Black, Unit selection in a concatenative speech synthesis system using a large speech database, Proc. ICASSP, pp.373-376, 1996.
DOI : 10.1109/icassp.1996.541110

URL : http://www.era.lib.ed.ac.uk/bitstream/1842/1082/1/hunt+1996.pdf

A. Breen and P. Jackson, A phonologically motivated method of selecting nonuniform units, Proc. ICSLP, pp.2735-2738, 1998.

R. E. Donovan and E. M. Eide, The IBM trainable speech synthesis system, Proc. ICSLP, pp.1703-1706, 1998.

B. Beutnagel, A. Conkie, J. Schroeter, Y. Stylianou, and A. Syrdal, The AT&T Next-Gen TTS system, Proc. Joint ASA, EAA and DAEA Meeting, pp.15-19, 1999.
DOI : 10.1121/1.424924

G. Coorman, J. Fackrell, P. Rutten, and B. Coile, Segment selection in the L & H realspeak laboratory TTS system, Proc. ICSLP, pp.395-398, 2000.

T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis, Proc. Eurospeech, pp.2347-2350, 1999.

Z. Ling, Y. Wu, Y. Wang, L. Qin, and R. Wang, USTC system for Blizzard Challenge 2006 an improved HMM-based speech synthesis method, Proc. the Blizzard Challenge Workshop, 2006.

A. W. Black, CLUSTERGEN: A statistical parametric synthesizer using trajectory modeling, Proc. Interspeech, pp.1762-1765, 2006.

H. Zen, T. Toda, M. Nakamura, and K. Tokuda, Details of the Nitech HMM-based speech synthesis system for the Blizzard Challenge, IEICE Trans. Inf. Syst, issue.1, pp.325-333, 2005.

H. Zen, K. Tokuda, and A. W. Black, Statistical parametric speech synthesis. Speech Communication, vol.51, pp.1039-1064, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00746106

J. Yamagishi, T. Kobayashi, Y. Nakano, K. Ogata, and J. Isogai, Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm, IEEE Trans. Speech, Audio & Language Process, vol.17, issue.1, pp.66-83, 2009.

C. J. Leggetter and P. C. Woodland, Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models, Comput. Speech Lang, vol.9, pp.171-185, 1995.

P. C. Woodland, Speaker adaptation for continuous density HMMs: A review, Proc. ISCA Workshop on Adaptation Methods for Speech Recognition, p.119, 2001.

H. Ze, A. Senior, and M. Schuster, Statistical parametric speech synthesis using deep neural networks, Proc. ICASSP, pp.7962-7966, 2013.
DOI : 10.1109/icassp.2013.6639215

URL : http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/40837.pdf

Z. H. Ling, L. Deng, and D. Yu, Modeling spectral envelopes using restricted boltzmann machines and deep belief networks for statistical parametric speech synthesis, IEEE Transactions on Audio, Speech, and Language Processing, vol.21, issue.10, pp.2129-2139, 2013.

Y. Fan, Y. Qian, F. Xie, and F. K. Soong, TTS synthesis with bidirectional LSTM based recurrent neural networks, Proc. Interspeech, pp.1964-1968, 2014.

H. Zen and H. Sak, Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis, Proc. ICASSP, pp.4470-4474, 2015.

Z. Wu and S. King, Investigating gated recurrent networks for speech synthesis, Proc. ICASSP, pp.5140-5144, 2016.

X. Wang, S. Takaki, and J. Yamagishi, Investigating very deep highway networks for parametric speech synthesis, 9th ISCA Speech Synthesis Workshop, pp.166-171, 2016.

X. Wang, S. Takaki, and J. Yamagishi, Investigating very deep highway networks for parametric speech synthesis, Speech Communication, vol.96, pp.1-9, 2018.

X. Wang, S. Takaki, and J. Yamagishi, An autoregressive recurrent mixture density network for parametric speech synthesis, Proc. ICASSP, pp.4895-4899, 2017.

X. Wang, S. Takaki, and J. Yamagishi, An RNN-based quantized F0 model with multitier feedback links for text-to-speech synthesis, Proc. Interspeech, pp.1059-1063, 2017.

Y. Saito, S. Takamichi, and H. Saruwatari, Training algorithm to deceive anti-spoofing verification for DNN-based speech synthesis, Proc. ICASSP, pp.4900-4904, 2017.

Y. Saito, S. Takamichi, and H. Saruwatari, Statistical parametric speech synthesis incorporating generative adversarial networks, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.26, issue.1, pp.84-96, 2018.

T. Kaneko, H. Kameoka, N. Hojo, Y. Ijima, K. Hiramatsu et al., Generative adversarial network-based postfilter for statistical parametric speech synthesis, Proc. ICASSP, pp.4910-4914, 2017.

A. D. Van, S. Oord, H. Dieleman, K. Zen, O. Simonyan et al., Wavenet: A generative model for raw audio, 2016.

S. Mehri, K. Kumar, I. Gulrajani, R. Kumar, S. Jain et al., Samplernn: An unconditional end-to-end neural audio generation model, 2016.

Y. Wang, R. J. Skerry-ryan, D. Stanton, Y. Wu, R. J. Weiss et al., Tacotron: Towards end-to-end speech synthesis, Proc. Interspeech, pp.4006-4010, 2017.

A. Gibiansky, S. Arik, G. Diamos, J. Miller, K. Peng et al., Deep voice 2: Multi-speaker neural text-to-speech, Advances in Neural Information Processing Systems, pp.2966-2974, 2017.

J. Shen, M. Schuster, N. Jaitly, R. J. Skerry-ryan, R. A. Saurous et al., Natural tts synthesis by conditioning wavenet on mel spectrogram predictions, Proc. ICASSP, 2018.

S. King, Measuring a decade of progress in text-to-speech, Loquens, vol.1, issue.1, p.6, 2014.

S. King, L. Wihlborg, and W. Guo, The blizzard challenge 2017, Proc. Blizzard Challenge Workshop, 2017.

F. H. Foomany, A. Hirschfield, and M. Ingleby, Toward a dynamic framework for security evaluation of voice verification systems, 2009 IEEE Toronto International Conference, pp.22-27, 2009.

T. Masuko, T. Hitotsumatsu, K. Tokuda, and T. Kobayashi, On the security of HMMbased speaker verification systems against imposture using synthetic speech, Proc. EUROSPEECH, 1999.

T. Matsui and S. Furui, Likelihood normalization for speaker verification using a phonemeand speaker-independent model, Speech Commun, vol.17, issue.1-2, pp.109-116, 1995.

T. Masuko, K. Tokuda, T. Kobayashi, and S. Imai, Speech synthesis using HMMs with dynamic features, Proc. ICASSP, 1996.

T. Masuko, K. Tokuda, T. Kobayashi, and S. Imai, Voice characteristics conversion for HMM-based speech synthesis system, Proc. ICASSP, 1997.

P. L. De-leon, M. Pucher, J. Yamagishi, I. Hernaez, and I. Saratxaga, Evaluation of speaker verification security and detection of HMM-based synthetic speech. Audio, Speech, and Language Processing, IEEE Transactions on, vol.20, issue.8, pp.2280-2290, 2012.

G. Galou, Synthetic voice forgery in the forensic context: a short tutorial, Forensic Speech and Audio Analysis Working Group (ENFSI-FSAAWG), pp.1-3, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00625918

W. Cai, A. Doshi, and R. Valle, Attacking speaker recognition with deep generative models, 2018.

T. Satoh, T. Masuko, T. Kobayashi, and K. Tokuda, A robust speaker verification system against imposture using an HMM-based speech synthesis system, Proc. Eurospeech, 2001.

L. Chen, W. Guo, and L. Dai, Speaker verification against synthetic speech, Chinese Spoken Language Processing (ISCSLP), 2010 7th International Symposium on, pp.309-312, 2010.

T. F. Quatieri, Discrete-Time Speech Signal Processing: Principles and Practice, 2002.

Z. Wu, E. S. Chng, and H. Li, Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition, Proc. Interspeech, 2012.

A. Ogihara, H. Unno, and A. Shiozakai, Discrimination method of synthetic speech using pitch frequency against synthetic speech falsification, IEICE transactions on fundamentals of electronics, communications and computer sciences, vol.88, issue.1, pp.280-286, 2005.

P. L. De-leon, B. Stewart, and J. Yamagishi, Synthetic speech discrimination using pitch pattern statistics derived from image analysis, Proc. Interspeech, 2012.

Y. Stylianou, Voice transformation: a survey, Proc. ICASSP, pp.3585-3588, 2009.

B. L. Pellom and J. H. Hansen, An experimental study of speaker verification sensitivity to computer voice-altered imposters, Proc. ICASSP, vol.2, pp.837-840, 1999.

S. H. Mohammadi and A. Kain, An overview of voice conversion systems, Speech Communication, vol.88, pp.65-82, 2017.

M. Abe, S. Nakamura, K. Shikano, and H. Kuwabara, Voice conversion through vector quantization, Proc. ICASSP, pp.655-658, 1988.

L. M. Arslan, Speaker transformation algorithm using segmental codebooks (STASC), Speech Communication, vol.28, issue.3, pp.211-226, 1999.

A. Kain and M. W. Macon, Spectral voice conversion for text-to-speech synthesis, Proc. ICASSP, vol.1, pp.285-288, 1998.

Y. Stylianou, O. Cappé, and E. Moulines, Continuous probabilistic transform for voice conversion. Speech and Audio Processing, IEEE Transactions on, vol.6, issue.2, pp.131-142, 1998.

T. Toda, A. W. Black, and K. Tokuda, Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. Audio, Speech, and Language Processing, IEEE Transactions on, vol.15, issue.8, pp.2222-2235, 2007.

K. Kobayashi, T. Toda, G. Neubig, S. Sakti, and S. Nakamura, Statistical singing voice conversion with direct waveform modification based on the spectrum differential, Proc. Interspeech, 2014.

V. Popa, H. Silen, J. Nurminen, and M. Gabbouj, Local linear transformation for voice conversion, Proc. ICASSP, pp.4517-4520, 2012.

Y. Chen, M. Chu, E. Chang, J. Liu, and R. Liu, Voice conversion with smoothed GMM and MAP adaptation, Proc. EUROSPEECH, pp.2413-2416, 2003.

H. Hwang, Y. Tsao, H. Wang, Y. Wang, and S. Chen, A study of mutual information for GMM-based spectral conversion, Proc. Interspeech, 2012.

E. Helander, T. Virtanen, J. Nurminen, and M. Gabbouj, Voice conversion using partial least squares regression. Audio, Speech, and Language Processing, IEEE Transactions on, vol.18, issue.5, pp.912-921, 2010.

N. Pilkington, H. Zen, and M. Gales, Gaussian process experts for voice conversion, Proc. Interspeech, 2011.

D. Saito, K. Yamamoto, N. Minematsu, and K. Hirose, One-to-many voice conversion based on tensor representation of speaker space, Proc. Interspeech, pp.653-656, 2011.

H. Zen, Y. Nankaku, and K. Tokuda, Continuous stochastic feature mapping based on trajectory HMMs. Audio, Speech, and Language Processing, vol.19, pp.417-430, 2011.

Z. Wu, T. Kinnunen, E. S. Chng, and H. Li, Mixture of factor analyzers using priors from non-parallel speech for voice conversion, IEEE Signal Processing Letters, vol.19, issue.12, 2012.

D. Saito, S. Watanabe, A. Nakamura, and N. Minematsu, Statistical voice conversion based on noisy channel model. Audio, Speech, and Language Processing, IEEE Transactions on, vol.20, issue.6, pp.1784-1794, 2012.

P. Song, Y. Q. Bao, L. Zhao, and C. R. Zou, Voice conversion using support vector regression, Electronics letters, vol.47, issue.18, pp.1045-1046, 2011.

E. Helander, H. Silén, T. Virtanen, and M. Gabbouj, Voice conversion using dynamic kernel partial least squares regression, IEEE Trans. Audio, Speech and Language Processing, vol.20, issue.3, pp.806-817, 2012.

Z. Wu, E. S. Chng, and H. Li, Conditional restricted boltzmann machine for voice conversion, the first IEEE China Summit and International Conference on Signal and Information Processing, 2013.

M. Narendranath, H. Murthy, S. Rajendran, and B. Yegnanarayana, Transformation of formants for voice conversion using artificial neural networks, Speech Communication, vol.16, issue.2, pp.207-216, 1995.

S. Desai, E. Raghavendra, B. Yegnanarayana, A. Black, and K. Prahallad, Voice conversion using artificial neural networks, Proc. ICASSP, pp.3893-3896, 2009.

Y. Saito, S. Takamichi, and H. Saruwatari, Voice conversion using input-to-output highway networks, IEICE Transactions on Information and Systems, vol.100, issue.8, pp.1925-1928, 2017.

T. Nakashika, T. Takiguchi, and Y. Ariki, Voice conversion using RNN pre-trained by recurrent temporal restricted boltzmann machines, IEEE/ACM Transactions on Audio, Speech and Language Processing, vol.23, issue.3, pp.580-587, 2015.

L. Sun, S. Kang, K. Li, and H. Meng, Voice conversion using deep bidirectional long shortterm memory based recurrent neural networks, Proc. ICASSP, pp.4869-4873, 2015.

D. Sundermann and H. Ney, VTLN-based voice conversion, Proceedings of the 3rd IEEE International Symposium on, pp.556-559, 2003.

D. Erro, A. Moreno, and A. Bonafonte, Voice conversion based on weighted frequency warping. Audio, Speech, and Language Processing, IEEE Transactions on, vol.18, issue.5, pp.922-931, 2010.

D. Erro, E. Navas, and I. Hernaez, Parametric voice conversion based on bilinear frequency warping plus amplitude scaling. Audio, Speech, and Language Processing, IEEE Transactions on, vol.21, issue.3, pp.556-566, 2013.
DOI : 10.1109/tasl.2012.2227735

C. Hsu, H. Hwang, Y. Wu, Y. Tsao, and H. Wang, Voice conversion from unaligned corpora using variational autoencoding wasserstein generative adversarial networks, Proc. Interspeech, pp.3364-3368, 2017.
DOI : 10.21437/interspeech.2017-63

URL : http://arxiv.org/pdf/1704.00849

H. Miyoshi, Y. Saito, S. Takamichi, and H. Saruwatari, Voice conversion using sequenceto-sequence learning of context posterior probabilities, Proc. Interspeech, pp.1268-1272, 2017.
DOI : 10.21437/interspeech.2017-247

URL : http://arxiv.org/pdf/1704.02360

F. Fang, J. Yamagishi, I. Echizen, and J. Lorenzo-trueba, High-quality nonparallel voice conversion based on cycle-consistent adversarial network, Proc. ICASSP 2018, 2018.
DOI : 10.1109/icassp.2018.8462342

URL : http://arxiv.org/pdf/1804.00425

K. Kobayashi, T. Hayashi, A. Tamamori, and T. Toda, Statistical voice conversion with wavenet-based waveform generation, Proc. Interspeech, pp.1138-1142, 2017.
DOI : 10.21437/interspeech.2017-986

B. Gillet and S. King, Transforming F0 contours, Proc. EUROSPEECH, pp.101-104, 2003.

C. Wu, C. Hsia, T. Liu, and J. Wang, Voice conversion using duration-embedded bi-HMMs for expressive speech synthesis. Audio, Speech, and Language Processing, IEEE Transactions on, vol.14, issue.4, pp.1109-1116, 2006.

E. Helander and J. Nurminen, A novel method for prosody prediction in voice conversion, Proc. ICASSP, vol.4, p.509, 2007.
DOI : 10.1109/icassp.2007.366961

Z. Wu, T. Kinnunen, E. S. Chng, and H. Li, Text-independent F0 transformation with nonparallel data for voice conversion, Proc. Interspeech, 2010.

D. Lolive, N. Barbot, and O. Boeffard, Pitch and duration transformation with non-parallel data, Speech Prosody, pp.111-114, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00987810

T. Toda, L. Chen, D. Saito, F. Villavicencio, M. Wester et al., The voice conversion challenge 2016, Proc. Interspeech, pp.1632-1636, 2016.
DOI : 10.21437/interspeech.2016-1066

URL : https://www.pure.ed.ac.uk/ws/files/28565036/1066.PDF

M. Wester, Z. Wu, and J. Yamagishi, Analysis of the voice conversion challenge 2016 evaluation results, Proc. Interspeech, pp.1637-1641, 2016.

P. Perrot, G. Aversano, R. Blouet, M. Charbit, and G. Chollet, Voice forgery using ALISP: indexation in a client memory, Proc. ICASSP, vol.1, pp.17-20, 2005.
DOI : 10.1109/icassp.2005.1415039

D. Matrouf, J. Bonastre, and C. Fredouille, Effect of speech transformation on impostor acceptance, Proc. ICASSP, vol.1, 2006.
URL : https://hal.archives-ouvertes.fr/hal-01318472

T. Kinnunen, Z. Wu, K. A. Lee, F. Sedlak, E. S. Chng et al., Vulnerability of speaker verification systems against voice conversion spoofing attacks: The case of telephone speech, Proc. ICASSP, pp.4401-4404, 2012.

D. Sundermann, H. Hoge, A. Bonafonte, H. Ney, A. Black et al., Textindependent voice conversion based on unit selection, Proc. ICASSP, vol.1, 2006.
DOI : 10.1109/icassp.2006.1659962

URL : http://www.cs.cmu.edu/~awb/papers/ICASSP2006/0100081.pdf

Z. Wu, A. Larcher, K. A. Lee, E. S. Chng, T. Kinnunen et al., Vulnerability evaluation of speaker verification under voice conversion spoofing: the effect of text constraints, Proc. Interspeech, 2013.

F. Alegre, R. Vipperla, N. Evans, and B. Fauve, On the vulnerability of automatic speaker recognition to spoofing attacks with artificial signals, European Conference on Signal Processing (EUSIPCO), 2012 EURASIP Conference on, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00783812

P. L. De-leon, I. Hernaez, I. Saratxaga, M. Pucher, and J. Yamagishi, Detection of synthetic speech for the problem of imposture, Proc. ICASSP, pp.4844-4847, 2011.

Z. Wu, T. Kinnunen, E. S. Chng, H. Li, and E. Ambikairajah, A study on spoofing attack in state-of-the-art speaker verification: the telephone speech case, Proc. Asia-Pacific Signal Information Processing Association Annual Summit and Conference (APSIPA ASC), pp.1-5, 2012.

F. Alegre, R. Vipperla, and N. Evans, Spoofing countermeasures for the protection of automatic speaker recognition systems against attacks with artificial signals, Proc. Interspeech, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00783789

F. Alegre, A. Amehraye, and N. Evans, Spoofing countermeasures to protect automatic speaker verification from voice conversion, Proc. ICASSP, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00804543

Z. Wu, X. Xiao, E. S. Chng, and H. Li, Synthetic speech detection using temporal modulation feature, Proc. ICASSP, 2013.

F. Alegre, R. Vipperla, A. Amehraye, and N. Evans, A new speaker verification spoofing countermeasure based on local binary patterns, Proc. Interspeech, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00849138

Z. Wu, T. Kinnunen, N. Evans, J. Yamagishi, C. Hanilçi et al., ASVspoof 2015: The first automatic speaker verification spoofing and countermeasures challenge, Proc. Interspeech, 2015.

T. Kinnunen, M. Sahidullah, H. Delgado, M. Todisco, N. Evans et al., The ASVspoof 2017 challenge: Assessing the limits of replay spoofing attack detection, 2017.

Z. Wu, A. Khodabakhsh, C. Demiroglu, J. Yamagishi, D. Saito et al., SAS: A speaker verification spoofing database containing diverse attacks, Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), 2015.

Z. Wu, T. Kinnunen, N. Evans, and J. Yamagishi, ASVspoof 2015: Automatic speaker verification spoofing and countermeasures challenge evaluation plan, 2014.

T. B. Patel and H. A. , Combining evidences from mel cepstral, cochlear filter cepstral and instantaneous frequency features for detection of natural vs. spoofed speech, Proc. Interspeech, 2015.

S. Novoselov, A. Kozlov, G. Lavrentyeva, K. Simonchik, and V. Shchemelinin, STC antispoofing systems for the ASVspoof 2015 challenge, Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), pp.5475-5479, 2016.

N. Chen, Y. Qian, H. Dinkel, B. Chen, and K. Yu, Robust deep feature for spoofing detectionthe SJTU system for ASVspoof 2015 challenge, Proc. Interspeech, 2015.

X. Xiao, X. Tian, S. Du, H. Xu, E. S. Chng et al., Spoofing speech detection using high dimensional magnitude and phase features: The NTU approach for ASVspoof 2015 challenge, Proc. Interspeech, 2015.

M. J. Alam, P. Kenny, G. Bhattacharya, and T. Stafylakis, Development of CRIM system for the automatic speaker verification spoofing and countermeasures challenge, Proc. Interspeech, 2015.

Z. Wu, J. Yamagishi, T. Kinnunen, C. Hanili, M. Sahidullah et al., Asvspoof: The automatic speaker verification spoofing and countermeasures challenge, IEEE Journal of Selected Topics in Signal Processing, vol.11, issue.4, pp.588-604, 2017.

H. Delgado, M. Todisco, M. Sahidullah, N. Evans, T. Kinnunen et al., ASVspoof 2017 version 2.0: meta-data analysis and baseline enhancements, The Speaker and Language Recognition Workshop, pp.296-303, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01880206

M. Todisco, H. Delgado, and N. Evans, A new feature for automatic speaker verification anti-spoofing: Constant Q cepstral coefficients, Proc. Odyssey: the Speaker and Language Recognition Workshop, pp.283-290, 2016.

M. Todisco, H. Delgado, and N. Evans, Constant Q cepstral coefficients: A spoofing countermeasure for automatic speaker verification, Computer Speech & Language, vol.45, pp.516-535, 2017.

G. Lavrentyeva, S. Novoselov, E. Malykh, A. Kozlov, O. Kudashev et al., Audio replay attack detection with deep learning frameworks, Proc. Interspeech, pp.82-86, 2017.

Z. Ji, Z. Y. Li, P. Li, M. An, S. Gao et al., Ensemble learning for countermeasure of audio replay spoofing attack in ASVspoof2017, Proc. Interspeech, pp.87-91, 2017.

L. Li, Y. Chen, D. Wang, and T. F. Zheng, A study on replay attack and anti-spoofing for automatic speaker verification, Proc. Interspeech, pp.92-96, 2017.

H. A. Patil, M. R. Kamble, T. B. Patel, and M. H. Soni, Novel variable length teager energy separation based instantaneous frequency features for replay detection, Proc. Interspeech, pp.12-16, 2017.
DOI : 10.21437/interspeech.2017-1362

Z. Chen, Z. Xie, W. Zhang, and X. Xu, ResNet and model fusion for automatic spoofing detection, Proc. Interspeech, pp.102-106, 2017.
DOI : 10.21437/interspeech.2017-1085

Z. Wu, S. Gao, E. S. Cling, and H. Li, A study on replay attack and anti-spoofing for textdependent speaker verification, Proc. Asia-Pacific Signal Information Processing Association Annual Summit and Conference (APSIPA ASC), pp.1-5, 2014.
DOI : 10.1109/apsipa.2014.7041636

Q. Li, An auditory-based transform for audio signal processing, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp.181-184, 2009.
DOI : 10.1109/aspaa.2009.5346541

S. Davis and P. Mermelstein, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.28, issue.4, pp.357-366, 1980.

M. Sahidullah, T. Kinnunen, and C. Hanilci, A comparison of features for synthetic speech detection, Proc. Interspeech, pp.2087-2091, 2015.

J. Brown, Calculation of a constant Q spectral transform, Journal of the Acoustic Society of America, vol.89, issue.1, pp.425-434, 1991.
DOI : 10.1121/1.400476

M. J. Alam and P. Kenny, Spoofing detection employing infinite impulse response-constant Q transform-based feature representations, Proc. European Signal Processing Conference, 2017.
DOI : 10.23919/eusipco.2017.8081177

URL : https://zenodo.org/record/1160060/files/1570347518.pdf

P. Cancela, M. Rocamora, and E. López, An efficient multi-resolution spectral transform for music analysis, Proc. International Society for Music Information Retrieval Conference, pp.309-314, 2009.

Y. Bengio, Learning deep architectures for AI. Foundations and Trends in Machine Learning, vol.2, pp.1-127, 2009.
DOI : 10.1561/2200000006

I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio, Deep learning, 2016.

Y. Tian, M. Cai, L. He, and J. Liu, Investigation of bottleneck features and multilingual deep neural networks for speaker verification, Proc. Interspeech, pp.1151-1155, 2015.

F. Richardson, D. Reynolds, and N. Dehak, Deep neural network approaches to speaker and language recognition, IEEE Signal Processing Letters, vol.22, issue.10, pp.1671-1675, 2015.
DOI : 10.1109/lsp.2015.2420092

G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. R. Mohamed et al., Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, IEEE Signal Processing Magazine, vol.29, issue.6, pp.82-97, 2012.

M. J. Alam, P. Kenny, V. Gupta, and T. Stafylakis, Spoofing detection on the ASVspoof2015 challenge corpus employing deep neural networks, Proc. Odyssey: the Speaker and Language Recognition Workshop, pp.270-276, 2016.
DOI : 10.21437/odyssey.2016-39

Y. Qian, N. Chen, and K. Yu, Deep features for automatic spoofing detection, Speech Communication, vol.85, pp.43-52, 2016.
DOI : 10.1016/j.specom.2016.10.007

H. Yu, Z. H. Tan, Y. Zhang, Z. Ma, and J. Guo, DNN filter bank cepstral coefficients for spoofing detection, IEEE Access, vol.5, pp.4779-4787, 2017.
DOI : 10.1109/access.2017.2687041

URL : https://doi.org/10.1109/access.2017.2687041

K. Sriskandaraja, V. Sethu, E. Ambikairajah, and H. Li, Front-end for antispoofing countermeasures in speaker verification: Scattering spectral decomposition, IEEE Journal of Selected Topics in Signal Processing, vol.11, issue.4, pp.632-643, 2017.

J. Andén and S. Mallat, Deep scattering spectrum, IEEE Transactions on Signal Processing, vol.62, issue.16, pp.4114-4128, 2014.

S. Mallat, Group invariant scattering, Communications on Pure and Applied Mathematics, vol.65, pp.1331-1398, 2012.
DOI : 10.1002/cpa.21413

URL : http://arxiv.org/pdf/1101.2286

M. Pal, D. Paul, and G. Saha, Synthetic speech detection using fundamental frequency variation and spectral features, Computer Speech & Language, vol.48, pp.31-50, 2018.
DOI : 10.1016/j.csl.2017.10.001

K. Laskowski, M. Heldner, and J. Edlund, The fundamental frequency variation spectrum, Proceedings of FONETIK, pp.29-32, 2008.

I. Saratxaga, J. Sanchez, Z. Wu, I. Hernaez, and E. Navas, Synthetic speech detection using phase information, Speech Communication, vol.81, pp.30-41, 2016.
DOI : 10.1016/j.specom.2016.04.001

URL : https://addi.ehu.es/bitstream/10810/23565/7/Speech%20Communication%20SSD%20using%20phase.pdf

L. Wang, S. Nakagawa, Z. Zhang, Y. Yoshida, and Y. Kawakami, Spoofing speech detection using modified relative phase information, IEEE Journal of selected topics in signal processing, vol.11, issue.4, pp.660-670, 2017.
DOI : 10.1109/jstsp.2017.2694139

S. Chakroborty and G. Saha, Improved text-independent speaker identification using fused MFCC & IMFCC feature sets based on Gaussian filter, International Journal of Signal Processing, vol.5, issue.1, pp.11-19, 2009.

X. Wu, R. He, Z. Sun, and T. Tan, A light CNN for deep face representation with noisy labels, IEEE Transactions on Information Forensics and Security, vol.13, issue.11, pp.2884-2896, 2018.
DOI : 10.1109/tifs.2018.2833032

URL : http://arxiv.org/pdf/1511.02683

A. R. Goncalves, R. P. Violato, P. Korshunov, S. Marcel, and F. O. Simoes, On the generalization of fused systems in voice presentation attack detection, 2017 International Conference of the Biometrics Special Interest Group (BIOSIG), pp.1-5, 2017.

D. Paul, M. Pal, and G. Saha, Novel speech features for improved detection of spoofing attacks, Proc. Annual IEEE India Conference (INDICON), 2016.
DOI : 10.1109/indicon.2015.7443805

URL : http://arxiv.org/pdf/1603.04264

N. Dehak, P. J. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, Front-end factor analysis for speaker verification, IEEE Trans. Audio, Speech and Language Processing, vol.19, issue.4, pp.788-798, 2011.

E. Khoury, T. Kinnunen, A. Sizov, Z. Wu, and S. Marcel, Introducing i-vectors for joint anti-spoofing and speaker verification, Proc. Interspeech, 2014.

A. Sizov, E. Khoury, T. Kinnunen, Z. Wu, and S. Marcel, Joint speaker verification and antispoofing in the i-vector space, IEEE Trans. Information Forensics and Security, vol.10, issue.4, pp.821-832, 2015.

C. Hanilçi, Data selection for i-vector based automatic speaker verification anti-spoofing, Digital Signal Processing, vol.72, pp.171-180, 2018.

X. Tian, Z. Wu, X. Xiao, E. S. Chng, and H. Li, Spoofing detection from a feature representation perspective, Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), pp.2119-2123, 2016.
DOI : 10.1109/icassp.2016.7472051

H. Yu, Z. H. Tan, Z. Ma, R. Martin, and J. Guo, Spoofing detection in automatic speaker verification systems using dnn classifiers and dynamic acoustic features, IEEE Transactions on Neural Networks and Learning Systems, issue.99, pp.1-12, 2018.
DOI : 10.1109/tnnls.2017.2771947

H. Dinkel, N. Chen, Y. Qian, and K. Yu, End-to-end spoofing detection with raw waveform cldnns, Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), pp.4860-4864, 2017.
DOI : 10.1109/icassp.2017.7953080

T. N. Sainath, R. J. Weiss, A. Senior, K. W. Wilson, and O. Vinyals, Learning the speech front-end with raw waveform CLDNNs, Proc. Interspeech, 2015.

C. Zhang, C. Yu, and J. H. Hansen, An investigation of deep-learning frameworks for speaker verification antispoofing, IEEE Journal of Selected Topics in Signal Processing, vol.11, issue.4, pp.684-694, 2017.
DOI : 10.1109/jstsp.2016.2647199

H. Muckenhirn, M. Magimai-doss, and S. Marcel, End-to-end convolutional neural networkbased voice presentation attack detection, 2017 IEEE International Joint Conference on Biometrics (IJCB), pp.335-341, 2017.
DOI : 10.1109/btas.2017.8272715

S. Chen, K. Ren, S. Piao, C. Wang, Q. Wang et al., You can hear but you cannot steal: Defending against voice impersonation attacks on smartphones, Distributed Computing Systems (ICDCS), pp.183-195, 2017.
DOI : 10.1109/icdcs.2017.133

S. Shiota, F. Villavicencio, J. Yamagishi, N. Ono, I. Echizen et al., Voice liveness detection algorithms based on pop noise caused by human breath for automatic speaker verification, Proc. Interspeech, 2015.

S. Shiota, F. Villavicencio, J. Yamagishi, N. Ono, I. Echizen et al., Voice liveness detection for speaker verification based on a tandem single/double-channel pop noise detector, 2016.
DOI : 10.21437/odyssey.2016-37

M. Sahidullah, D. A. Thomsen, R. G. Hautamäki, T. Kinnunen, Z. Tan et al., Robust voice liveness detection and speaker verification using throat microphones, Speech, and Language Processing, vol.26, pp.44-56, 2018.
DOI : 10.1109/taslp.2017.2760243

URL : https://erepo.uef.fi/bitstream/123456789/4377/1/IEEE_TASLP_ThroatMic_Accepted.pdf

G. W. Elko, J. Meyer, S. Backer, and J. Peissig, Electronic pop protection for microphones, Applications of Signal Processing to Audio and Acoustics, pp.46-49, 2007.
DOI : 10.1109/aspaa.2007.4393041

L. Zhang, S. Tan, J. Yang, and Y. Chen, Voicelive: A phoneme localization based liveness detection for voice authentication on smartphones, Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp.1080-1091, 2016.

L. Zhang, S. Tan, and J. Yang, Hearing your voice is not enough: An articulatory gesture based liveness detection for voice authentication, Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp.57-71, 2017.
DOI : 10.1145/3133956.3133962

C. Hanilçi, T. Kinnunen, M. Sahidullah, and A. Sizov, Spoofing detection goes noisy: An analysis of synthetic speech detection in the presence of additive noise, Speech Communication, vol.85, pp.83-97, 2016.

H. Yu, A. K. Sarkar, D. A. Thomsen, Z. Tan, Z. Ma et al., Effect of multi-condition training and speech enhancement methods on spoofing detection, Proc. International Workshop on Sensing, Processing and Learning for Intelligent Machines (SPLINE), vol.7, p.2016
DOI : 10.1109/splim.2016.7528399

X. Tian, Z. Wu, X. Xiao, E. S. Chng, and H. Li, An investigation of spoofing speech detection under additive noise and reverberant conditions, Proc. Interspeech, 2016.
DOI : 10.21437/interspeech.2016-743

H. Delgado, M. Todisco, N. Evans, M. Sahidullah, W. M. Liu et al., Impact of bandwidth and channel variation on presentation attack detection for speaker verification, 2017 International Conference of the Biometrics Special Interest Group (BIOSIG), pp.1-6, 2017.

Y. Qian, N. Chen, H. Dinkel, and Z. Wu, Deep feature engineering for noise robust spoofing detection, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.25, issue.10, pp.1942-1955, 2017.
DOI : 10.1109/taslp.2017.2732162

P. Korshunov and S. Marcel, Cross-database evaluation of audio-based spoofing detection systems, Proc. Interspeech, 2016.
DOI : 10.21437/interspeech.2016-1326

URL : https://infoscience.epfl.ch/record/219837/files/Korshunov_INTERSPEECH_2016.pdf

D. Paul, M. Sahidullah, and G. Saha, Generalization of spoofing countermeasures: A case study with ASVspoof 2015 and BTAS 2016 corpora, Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), pp.2047-2051, 2017.
DOI : 10.1109/icassp.2017.7952516

URL : https://erepo.uef.fi/bitstream/123456789/4368/1/paul_generalization_fd.pdf

J. Lorenzo-trueba, F. Fang, X. Wang, I. Echizen, J. Yamagishi et al., Can we steal your vocal identity from the Internet?: Initial investigation of cloning Obama's voice using GAN, WaveNet and low-quality found data, Proc. Odyssey: the Speaker and Language Recognition Workshop, 2018.

I. Goodfellow, J. Pouget-abadie, M. Mirza, B. Xu, D. Warde-farley et al., Generative adversarial nets, Advances in neural information processing systems, pp.2672-2680, 2014.

F. Kreuk, Y. Adi, M. Cisse, and J. Keshet, Fooling end-to-end speaker verification by adversarial examples, 2018.
DOI : 10.1109/icassp.2018.8462693

URL : http://arxiv.org/pdf/1801.03339

M. Sahidullah, H. Delgado, M. Todisco, H. Yu, T. Kinnunen et al., Integrated spoofing countermeasures and automatic speaker verification: an evaluation on ASVspoof, Proc. Interspeech, 2015.
DOI : 10.21437/interspeech.2016-1280

URL : http://www.isca-speech.org/archive/Interspeech_2016/pdfs/1280.PDF

H. Muckenhirn, P. Korshunov, M. Magimai-doss, and S. Marcel, Long-term spectral statistics for voice presentation attack detection, IEEE/ACM Transactions on Audio, Speech and Language Processing, vol.25, issue.11, pp.2098-2111, 2017.
DOI : 10.1109/taslp.2017.2743340

URL : https://infoscience.epfl.ch/record/226623/files/Muckenhirn_Idiap-RR-11-2017.pdf

A. Sarkar, M. Sahidullah, Z. Tan, and T. Kinnunen, Improving speaker verification performance in presence of spoofing attacks using out-of-domain spoofed data, Proc. Interspeech, 2017.

T. Kinnunen, K. A. Lee, H. Delgado, N. Evans, M. Todisco et al., t-DCF: a detection cost function for the tandem assessment of spoofing countermeasures and automatic speaker verification, Proc. Odyssey: the Speaker and Language Recognition Workshop, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01880306

M. Todisco, H. Delgado, K. A. Lee, M. Sahidullah, N. Evans et al., Integrated presentation attack detection and automatic speaker verification: Common features and Gaussian back-end fusion, Proc. Interspeech, 2018.
DOI : 10.21437/interspeech.2018-2289

URL : https://hal.archives-ouvertes.fr/hal-01889934

Z. Wu, P. L. De-leon, C. Demiroglu, A. Khodabakhsh, S. King et al., Anti-spoofing for text-independent speaker verification: An initial database

, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.24, issue.4, pp.768-783, 2016.