G. Qader, D. Lecorvé, P. Lolive, and . Sébillot, Probabilistic Speaker Pronunciation Adaptation for Spontaneous Speech Synthesis Using Linguistic Features, Proceedings of the International Conference on Statistical Language and Speech Processing, p.2015
DOI : 10.1109/LSP.2010.2098440
URL : https://hal.archives-ouvertes.fr/hal-01181192

M. Tahon, R. Qader, G. Lecorvé, and D. Lolive, Improving TTS with Corpus-Specific Pronunciation Adaptation, Interspeech 2016, 2016.
DOI : 10.21437/Interspeech.2016-864
URL : https://hal.archives-ouvertes.fr/hal-01338111

M. Tahon, R. Qader, G. Lecorvé, and D. Lolive, Optimal Feature Set and Minimal Training Size for Pronunciation Adaptation in TTS, Proceedings of the International Conference on Statistical Language and Speech Processing, p.2016
DOI : 10.21437/Interspeech.2016-864
URL : https://hal.archives-ouvertes.fr/hal-01338853

G. Qader, D. Lecorvé, P. Lolive, and . Sébillot, Adaptation de la prononciation pour la synthèse de la parole spontanée en utilisant des informations linguistiques

G. Qader, D. Lecorvé, P. Lolive, and . Sébillot, Ajout automatique de disfluences pour la synthèse de la parole spontanée : formalisation et preuve de concept

M. Adda-decker and L. Lamel, Pronunciation variants across system configuration, language and speaking style, Speech Communication, vol.29, issue.2-4, 1999.
DOI : 10.1016/S0167-6393(99)00032-1
URL : ftp://tlp.limsi.fr/public/spc99pron.ps.Z

M. Adda-decker, P. Boula-de-mareüil, and L. Lamel, Pronunciation variants in french: schwa & liaison, International Congress of Phonetic Sciences, 1999.

M. Adda-decker, P. Boula-de-mareüil, G. Adda, and L. Lamel, Investigating syllabic structures and their variation in spontaneous French, Speech Communication, vol.46, issue.2, 2005.
DOI : 10.1016/j.specom.2005.03.006

J. Adell, A. Bonafonte, and D. Escudero, Filled Pauses in Speech Synthesis: Towards Conversational Speech, Proceedings of Text, Speech and Dialogue (TSD), 2007.
DOI : 10.1007/978-3-540-74628-7_47
URL : http://gps-tsc.upc.es/veu/research/pubs/download/Ade_Fil_07.pdf

J. Adell, A. Bonafonte, and D. E. Mancebo, On the generation of synthetic disfluent speech: local prosodic modifications caused by the insertion of editing terms, Proceedings of Annual Conference of the International Speech Communication Association (Interspeech), 2008.

J. Adell, D. Escudero, and A. Bonafonte, Production of filled pauses in concatenative speech synthesis based on the underlying fluent sentence, Speech Communication, vol.54, issue.3, 2012.
DOI : 10.1016/j.specom.2011.10.010

S. Andersson, K. Georgila, D. Traum, M. Aylett, A. Robert et al., Prediction and realisation of conversational characteristics by utilising spontaneous speech for unit selection, Speech Prosody, 2010.

M. Levent, . Arslan, H. John, and . Hansen, Language accent classification in american english, Speech Communication, vol.18, 1996.

M. Ashby and J. A. Maidment, Introducing Phonetic Science, 2005.
DOI : 10.1017/CBO9780511808852

J. Bachorowski, Vocal Expression and Perception of Emotion, Current Directions in Psychological Science, vol.15, issue.2, 1999.
DOI : 10.1007/BF00995674

R. Bates and M. Ostendorf, Modeling pronunciation variation in conversational speech using prosody, Proceedings of ISCA Workshop on Pronunciation Modeling and Lexicon Adaptation for Spoken Language Technology (ITRW), 2002.

A. Bell, D. Jurafsky, E. Fosler-lussier, C. Girand, M. Gregory et al., Effects of disfluencies, predictability, and utterance position on word form variation in English conversation, The Journal of the Acoustical Society of America, vol.113, issue.2, 2003.
DOI : 10.1121/1.1534836

A. Bell, M. Jason, M. Brenier, C. Gregory, D. Girand et al., Predictability effects on durations of content and function words in conversational English, Journal of Memory and Language, vol.60, issue.1, 2009.
DOI : 10.1016/j.jml.2008.06.003

G. Beller, Transformation of expressivity in speech, Linguistic Insights, 2009.

C. L. Bennett and A. W. Black, Using acoustic models to choose pronunciation variations for synthetic voices, Proceedings of Annual Conference of the International Speech Communication Association (Interspeech), 2003.

M. Christopher and . Bishop, Neural networks for pattern recognition, 1995.

E. Susan, . Brennan, F. Michael, and . Schober, How listeners compensate for disfluencies in spontaneous speech, Journal of Memory and Language, vol.44, 2001.

K. Brown and K. Allan, Concise encyclopedia of semantics, 2010.

J. Byrnes, Signal Processing for Multimedia, 1999.

B. Cannas, . Cau, . Fanni, . Sonato, and J. Zedda, Automatic disruption classification at JET: comparison of different pattern recognition techniques, Nuclear Fusion, vol.46, issue.7, 2006.
DOI : 10.1088/0029-5515/46/7/002

K. Chen, M. Hasegawa, and -. , Modeling pronunciation variation using artificial neural networks for English spontaneous speech, Proceedings of Annual Conference of the International Speech Communication Association (Interspeech), 2004.

J. Chevelu, D. Lolive, S. L. Maguer, and D. Guennec, How to compare tts systems: A new subjective evaluation methodology focused on differences, Proceedings of Annual Conference of the International Speech Communication Association (Interspeech), 2015.
URL : https://hal.archives-ouvertes.fr/hal-01199082

N. Chomsky, Syntactic Structures. Bod Third Party Titles, 2002.

H. Herbert and . Clark, Using Language, 1996.

H. Herbert and . Clark, Speaking in time, Speech Communication, vol.36, 2002.

H. Herbert, J. E. Clark, and . Fox-tree, Using uh and um in spontaneous speaking, Cognition, vol.84, 2002.

B. Collins and I. M. Mees, Practical Phonetics and Phonology: A Resource Book for Students, 2013.

R. Comer and E. Gould, Psychology Around Us, 2010.

M. Corley, J. Robert, and . Hartsuiker, Hesitation in speech can... um... help a listener understand, Proceedings of Meeting of the Cognitive Science Society, 2003.

R. Dall, M. Tomalin, M. Wester, J. William, S. Byrne et al., Investigating automatic & human filled pause insertion for speech synthesis, Proceedings of Annual Conference of the International Speech Communication Association (Interspeech), 2014.

I. Robert, . Damper, F. John, and . Eastmond, Pronunciation by analogy: Impact of implementational choices on performance, Language and Speech, vol.40, 1997.

J. Michael, H. C. Dedina, and . Nusbaum, PRONOUNCE: a program for pronunciation by analogy, Computer Speech & Language, vol.5, 1991.

E. Delais-roussarie, M. Avanzi, and S. Herment, Prosody and Language in Contact: L2 Acquisition, Attrition and Languages in Multilingual Situations, 2015.
DOI : 10.1007/978-3-662-45168-7

P. Dilts, Modelling phonetic reduction in a corpus of spoken English using Random Forests and Mixed-Effects Regression, 2013.

T. Drugman, A. Moinet, and T. Dutoit, On the use of machine learning in statistical parametric speech synthesis, Proceedings of Benelearn, 2008.

D. Duez, Silent and Non-Silent Pauses in Three Speech Styles, Language and Speech, vol.15, issue.1, 1982.
DOI : 10.1044/jshr.1501.49

E. Farnetani and D. Recasens, Coarticulation and connected speech processes. The handbook of phonetic sciences, 1997.
DOI : 10.1002/9781444317251.ch9

E. Fosler-lussier and N. Morgan, Effects of speaking rate and word frequency on conversational pronunciations, Modeling Pronunciation Variation for Automatic Speech Recognition, 1998.

E. Fosler-lussier and N. Morgan, Effects of speaking rate and word frequency on pronunciations in convertional speech, Speech Communication, vol.29, 1999.

E. Fosler-lussier, Multi-level decision trees for static and dynamic pronunciation models, Proceedings of the European Conference on Speech Communication and Technology (Eurospeech), 1999.

J. Eric-fosler-lussier, Dynamic pronunciation models for automatic speech recognition, 1999.

A. Carol, J. Fowler, and . Housum, Talkers' signaling of " new " and " old " words in speech and listeners' perception and use of the distinction, Journal of Memory and Language, vol.26, 1987.

E. Jean and . Fox-tree, The effects of false starts and repetitions on the processing of subsequent words in spontaneous speech, Journal of Memory and Language, vol.34, 1995.

E. Jean and . Fox-tree, Listeners' uses of um and uh in speech comprehension, Memory & cognition, vol.29, 2001.

E. Jean, J. C. Fox-tree, and . Schrock, Discourse markers in spontaneous speech: Oh what a difference an oh makes, Journal of Memory and Language, vol.40, 1999.

E. Jean, J. C. Fox-tree, and . Schrock, Basic meanings of you know and i mean, Journal of Pragmatics, vol.34, 2002.

J. William and . Frawley, International encyclopedia of linguistics, 2003.

T. Fukada, T. Yoshimura, and Y. Sagisaka, Automatic generation of multiple pronunciations based on neural networks, Speech Communication, vol.27, issue.1, 1999.
DOI : 10.1016/S0167-6393(98)00066-1

F. Gallwitz, H. Niemann, E. Nöth, and V. Warnke, Integrated recognition of words and prosodic phrase boundaries, Speech Communication, vol.36, issue.1-2, 2002.
DOI : 10.1016/S0167-6393(01)00027-9

P. Kumar, G. , and S. Narayanan, Automatic speech recognition using articulatory features from subject-independent acoustic-to-articulatory inversion, The Journal of the Acoustical Society of America, vol.130, 2011.

E. Giachin, A. Rosenberg, and C. Lee, Word juncture modeling using phonological rules for HMM-based continuous speech recognition, Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1990.
DOI : 10.1109/icassp.1990.115893

R. Goldberg and L. Riek, A Practical Handbook of Speech Coders, 2000.
DOI : 10.1201/9781420036824

M. Goto, K. Itou, and S. Hayamizu, A real-time filled pause detection system for spontaneous speech recognition, Proceedings of the European Conference on Speech Communication and Technology (Eurospeech), 1999.

D. Govind and . Prasanna, Expressive speech synthesis: a review, International Journal of Speech Technology, vol.51, issue.4, 2013.
DOI : 10.1016/j.specom.2009.04.004

S. Greenberg, Speaking in shorthand ??? A syllable-centric perspective for understanding pronunciation variation, Speech Communication, vol.29, issue.2-4, 1999.
DOI : 10.1016/S0167-6393(99)00050-3

S. Greenberg, H. Carvey, and L. Hitchcock, The relation between stress accent and pronunciation variation in spontaneous american english discourse, Proceedings of Speech Prosody, 2002.

D. Guennec and D. Lolive, Unit Selection Cost Function Exploration Using an A* Based Text-to-Speech System, Proceedings of Text, Speech and Dialogue (TSD), 2014.
DOI : 10.1007/978-3-319-10816-2_52
URL : https://hal.archives-ouvertes.fr/hal-01133321

I. Guyon and A. Elissef, An introduction to variable and feature selection, Journal of Machine Learning Research, vol.3, 2003.

K. Han and G. Chen, Letter-to-sound for small-footprint multilingual tts engine, Proceedings of International Conference on Spoken Language Processing (ICSLP), 2004.

J. William, N. Hardcastle, and . Hewlett, Coarticulation: Theory, Data and Techniques, 2006.

J. William, A. Hardcastle, and . Marchal, Speech production and speech modelling, 2012.

A. Peter, . Heeman, F. James, and . Allen, Speech repairs, intonational phrases, and discourse markers: modeling speakers' utterances in spoken dialogue, Computational Linguistics, vol.25, 1999.

K. Hirose and J. Tao, Speech Prosody in Speech Synthesis: Modeling and generation of prosody for high quality and flexible speech synthesis, 2015.
DOI : 10.1007/978-3-662-45258-5

U. Hain, Evaluation of modules and tools for speech synthesis: the ecess framework, LREC, 2008.

W. Holmes, Speech synthesis and recognition, 2001.

M. Honal and T. Schultz, Automatic Disfluency Removal on Recognized Spontaneous Speech - Rapid Adaptation to Speaker Dependent Disfluencies, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005., 2005.
DOI : 10.1109/ICASSP.2005.1415277
URL : http://digbib.ubka.uni-karlsruhe.de/volltexte/documents/336203/

X. Huang, A. Acero, and H. Hon, Spoken Language Processing: A Guide to Theory, Algorithm, and System Development, 2001.

D. Xuedong, Y. Huang, . Ariki, A. Mervyn, and . Jack, Hidden Markov models for speech recognition, 1990.

S. Huet, G. Gravier, and P. Sébillot, Morpho-syntactic postprocessing of n-best lists for improved french automatic speech recognition, Computer Speech & Language, vol.24, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00508471

A. Iida, N. Campbell, F. Higuchi, and M. Yasumura, A corpus-based speech synthesis system with emotion, Speech Communication, vol.40, issue.1-2, 2003.
DOI : 10.1016/S0167-6393(02)00081-X

P. Jande, Phonological reduction in swedish, Proceedings of ICPhS, 2003.

S. Jiampojamarn, Grapheme-to-phoneme conversion and its application to transliteration, 2011.

D. Jurafsky, W. Ward, Z. Banping, K. Herold, Y. Xiuyang et al., What kind of pronunciation variation is hard for triphones to model?, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221), 2001.
DOI : 10.1109/ICASSP.2001.940897

P. Karanasou, Phonemic variability and confusability in pronunciation modeling for automatic speech recognition, 2013.
URL : https://hal.archives-ouvertes.fr/tel-00843589

P. Karanasou, F. Yvon, T. Lavergne, and L. Lamel, Discriminative training of a phoneme confusion model for a dynamic lexicon in ASR, Proceedings of Annual Conference of the International Speech Communication Association (Interspeech), 2013.

R. Karhila, U. Remes, and M. Kurimo, Noise in HMM-Based Speech Synthesis Adaptation: Analysis, Evaluation Methods and Experiments, IEEE Journal of Selected Topics in Signal Processing, vol.8, issue.2, 2014.
DOI : 10.1109/JSTSP.2013.2278492

M. Kaushik, M. Trinkle, and A. Hashemi-sakhtsari, Automatic detection and removal of disfluencies from spontaneous speech, Proceedings of the Australasian International Conference on Speech Science and Technology (SST), 2010.

K. Anne, R. Kienappel, and . Kneser, Designing very compact decision trees for grapheme-to-phoneme transcription, Proceedings of Annual Conference of the International Speech Communication Association (Interspeech), 2001.

S. King, A beginners' guide to statistical parametric speech synthesis. The Centre for Speech Technology Research, 2010.

S. King, An introduction to statistical parametric speech synthesis, Sadhana, vol.21, issue.1, 2011.
DOI : 10.1016/j.csl.2006.01.002

S. King and V. Karaiskos, The Blizzard Challenge 2012, Proceedings of Blizzard Challenge 2012 Workshop, 2012.

K. Kirchhoff, Robust speech recognition using articulatory information, 1999.

R. Paul, A. M. Jr, and . Kleinginna, A categorized list of emotion definitions, with suggestions for a consensual definition, Motivation and Emotion, vol.5, 1981.

H. Kruschke, Simulation of Speaking Styles with Adapted Prosody, Proceedings of Text, Speech and Dialogue (TSD), 2001.
DOI : 10.1007/3-540-44805-5_37

J. Kupiec, Robust part-of-speech tagging using a hidden Markov model, Computer Speech & Language, vol.6, issue.3, 1992.
DOI : 10.1016/0885-2308(92)90019-Z

P. Gitta and . Laan, The contribution of intonation, segmental durations, and spectral features to the perception of a spontaneous and a read speaking style, Speech Communication, vol.22, 1997.

J. D. Lafferty, A. Mccallum, and F. C. Pereira, Conditional random fields: Probabilistic models for segmenting and labeling sequence data, Proceedings of International Conference on Machine Learning (ICML), 2001.

J. Willem and . Levelt, Monitoring and self-repair in speech, Cognition, p.14, 1983.

P. Lieberman, Some Effects of Semantic and Grammatical Context on the Production and Perception of Speech, Language and Speech, vol.33, issue.3, 1963.
DOI : 10.1121/1.1936662

Y. Liu, E. Shriberg, A. Stolcke, D. Hillard, M. Ostendorf et al., Enriching speech recognition with automatic detection of sentence boundaries and disfluencies, Transactions on Audio, Speech, and Language Processing, 2006.

K. Livescu, P. Jyothi, and E. Fosler-lussier, Articulatory feature-based pronunciation modeling, Computer Speech & Language, vol.36, 2016.
DOI : 10.1016/j.csl.2015.07.003
URL : https://doi.org/10.1016/j.csl.2015.07.003

G. Mahl, Disturbances in the patient's speech as a function of anxiety. Eastern Psychological Association, Trends in content analysis, 1959.

P. Boula-de-mareüil, B. Habert, F. Bénard, M. Adda-decker, C. Barras et al., A quantitative study of disfluencies in french broadcast interviews, Proceedings of Disfluency in Spontaneous Speech Workshop, 2005.

L. Marten, At the Syntax-pragmatics Interface: Verbal Underspecification and Concept Formation in Dynamic Syntax, 2002.
DOI : 10.1093/acprof:oso/9780199250639.001.0001

C. Miller, Individuation of postlexical phonology for speech synthesis, The Third ESCA/COCOSDA Workshop (ETRW) on Speech Synthesis, 1998.

L. Moats and . Letrs, Language Essentials for Teachers of Reading and Spelling, Sopris West Educational Services, 2004.

T. Myers, J. Laver, and J. Anderson, The Cognitive Representation of Speech, 1981.

B. Oshika, V. W. Zue, R. Weeks, H. Neu, and J. Aurbach, The role of phonological rules in speech understanding research, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.23, issue.1, 1975.
DOI : 10.1109/TASSP.1975.1162639

V. Pagel, K. Lenzo, and A. Black, Letter to sound rules for accented lexicon compression. arXiv preprint cmp-lg/9808010, 1998.

A. Parlikar, Style-Specific Phrasing in Speech Synthesis, 2013.

N. Pathak and P. H. Talukdar, The basic grapheme to phoneme (G2P) rules for bodo language, International Journal, issue.2, 2013.

F. John, R. Pitrelli, . Bakis, M. Ellen, R. Eide et al., The ibm expressive text-to-speech synthesis system for american english, Transactions on Audio, Speech, and Language Processing, 2006.

K. Prahallad, A. W. Black, and R. Mosur, Sub-Phonetic Modeling For Capturing Pronunciation Variations For Conversational Speech Synthesis, 2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings, 2006.
DOI : 10.1109/ICASSP.2006.1660155

C. José, N. R. Príncipe, W. C. Euliano, and . Lefebvre, Neural and adaptive systems: fundamentals through simulations, 2000.

V. Pulkki and M. Karjalainen, Communication Acoustics: An Introduction to Speech, Audio and Psychoacoustics, 2015.

J. and R. Quinlan, Induction of decision trees, Machine Learning, 1986.
DOI : 10.1037/13135-000

K. and S. Rao, Predicting Prosody from Text for Text-to-Speech Synthesis, 2012.
DOI : 10.1007/978-1-4614-1338-7

M. Riley, W. Byrne, M. Finke, S. Khudanpur, A. Ljolje et al., Stochastic pronunciation modelling from hand-labelled phonetic corpora, Speech Communication, vol.29, issue.2-4, 1999.
DOI : 10.1016/S0167-6393(99)00037-0
URL : http://www.clsp.jhu.edu/people/byrne/ppubs/pmod.etrw98.pdf

R. Sherry and . Rochester, The significance of pauses in spontaneous speech, Journal of Psycholinguistic Research, vol.2, 1973.

R. Leon and R. , The communicative value of filled pauses in spontaneous speech, 1998.

I. Antti-veikko, S. Rosti, and . Matsoukas, Combining outputs from multiple machine translation systems, Proceedings of NAACL-HLT, 2007.

J. A. Russell, A circumplex model of affect., Journal of Personality and Social Psychology, vol.39, issue.6, 1980.
DOI : 10.1037/h0077714
URL : https://hal.archives-ouvertes.fr/hal-01086372

A. Peter and . Schreiber, Understanding prosody's role in reading acquisition. Theory into Practice, 1991.

M. Schröder, Expressive Speech Synthesis: Past, Present, and Possible Futures, Affective information processing, 2009.
DOI : 10.1007/978-1-84800-306-4_7

T. Schultz and K. Kirchhoff, Multilingual Speech Processing, 2006.

T. J. Sejnowski and C. R. Rosenberg, Parallel networks that learn to pronounce english text, 1987.

E. Elizabeth and . Shriberg, Phonetic consequences of speech disfluency, 1999.

E. E. Shriberg, Preliminaries to a theory of speech disfluencies, 1994.

P. Skandera and P. Burleigh, A Manual of English Phonetics and Phonology: Twelve Lessons with an Integrated Course in Phonetic Transcription, 2011.

A. Stolcke and E. Shriberg, Statistical language modeling for speech disfluencies, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, 1996.
DOI : 10.1109/ICASSP.1996.541118

A. Stolcke, E. Shriberg, R. A. Bates, M. Ostendorf, D. Hakkani et al., Automatic detection of sentence boundaries and disfluencies based on recognized words, Proceedings of International Conference on Spoken Language Processing (ICSLP), 1998.

A. Stolcke, J. Zheng, W. Wang, and V. Abrash, Srilm at sixteen: Update and outlook, Proceedings of Automatic Speech Recognition and Understanding Workshop (ASRU), 2011.

A. Stolcke, Srilm-an extensible language modeling toolkit, Proceedings of Annual Conference of the International Speech Communication Association (Interspeech ), 2002.

H. Strik, J. M. Kessens, and M. Wester, Modeling pronunciation variation for automatic speech recognition, Proceedings of the European Speech Communication Association (ESCA) Workshop, 1998.

S. Sundaram and S. Narayanan, An empirical text transformation method for spontaneous speech synthesizers, Proceedings of Annual Conference of the International Speech Communication Association (Interspeech), 2003.

C. Sutton and A. Mccallum, An introduction to conditional random fields for relational learning. Introduction to statistical relational learning, 2006.
DOI : 10.1561/2200000013
URL : http://www.cs.umass.edu/%7Ecasutton/publications/crftut-fnt.pdf

M. Swerts, A. Wichmann, and R. Beun, Filled pauses as markers of discourse structure, Proceedings of International Conference on Spoken Language Processing (ICSLP), 1996.

G. Tajchman, E. Foster, and D. Jurafsky, Building multiple pronunciation models for novel words using exploratory computational phonology, Proceedings of the European Conference on Speech Communication and Technology (Eurospeech), 1995.

P. Taylor, Hidden markov models for grapheme to phoneme conversion, Proceedings of Annual Conference of the International Speech Communication Association (Interspeech), 2005.

P. Taylor, Text-to-speech synthesis, 2009.
DOI : 10.1017/CBO9780511816338

M. Tomalin, M. Wester, R. Dall, S. Byrne, and . King, A latticebased approach to automatic filled pause insertion, Proceedings of the Disfluency in Spontaneous Speech (DiSS) Workshop, 2015.

S. Tseng, Grammar, prosody and speech disfluencies in spoken dialogues. Unpublished doctoral dissertation, 1999.

V. Jack and . Tu, Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes, Journal of Clinical Epidemiology, vol.49, 1996.

S. Aäron-van-den-oord, H. Dieleman, K. Zen, O. Simonyan, A. Vinyals et al., Wavenet: A generative model for raw audio, Proceedings of ISCA Speech Synthesis Workshop

B. Vazirnezhad, F. Almasganj, and S. M. Ahadi, Hybrid statistical pronunciation models designed to be trained by a medium-size corpus, Computer Speech & Language, vol.23, issue.1, 2009.
DOI : 10.1016/j.csl.2008.02.001

L. Robert and . Weide, The cmu pronouncing dictionary, 1998.

W. Colin, M. Wightman, and . Ostendorf, Automatic labeling of prosodic patterns, Transactions on Speech and Audio Processing, 1994.

J. Yamagishi, K. Onishi, T. Masuko, and T. Kobayashi, Acoustic Modeling of Speaking Styles and Emotional Expressions in HMM-Based Speech Synthesis, IEICE Transactions on Information and Systems, vol.88, issue.3, 2005.
DOI : 10.1093/ietisy/e88-d.3.502

C. Shi-yin, Z. Liu, Y. Zhang, D. Lin, J. Wang et al., Noisy training for deep neural networks in speech recognition, Speech, and Music Processing, 2015.

H. You, A. Alwan, A. Kazemzadeh, and S. Narayanan, Pronunciation variations of spanish-accented english spoken by young children, Proceedings of Annual Conference of the International Speech Communication Association (Interspeech ), 2005.

Y. You, Audio Coding: Theory and Applications, 2010.
DOI : 10.1007/978-1-4419-1754-6

H. Zen, T. Nose, J. Yamagishi, S. Sako, T. Masuko et al., The HMM-based speech synthesis system (HTS) version 2.0, Proceedings of SSW, 2007.

H. Zen, K. Tokuda, and A. W. Black, Statistical parametric speech synthesis, Speech Communication, vol.51, issue.11, 2009.
DOI : 10.1016/j.specom.2009.04.004
URL : https://hal.archives-ouvertes.fr/hal-00746106

H. Zen, A. Senior, and M. Schuster, Statistical parametric speech synthesis using deep neural networks, Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013. List of Figures 1.1 A diagram of the vocal organs (articulators) (source: [Benesty et al, p.19, 2007.