Probabilistic Speaker Pronunciation Adaptation for Spontaneous Speech Synthesis Using Linguistic Features, Proceedings of the International Conference on Statistical Language and Speech Processing, p.2015 ,
DOI : 10.1109/LSP.2010.2098440
URL : https://hal.archives-ouvertes.fr/hal-01181192
Improving TTS with Corpus-Specific Pronunciation Adaptation, Interspeech 2016, 2016. ,
DOI : 10.21437/Interspeech.2016-864
URL : https://hal.archives-ouvertes.fr/hal-01338111
Optimal Feature Set and Minimal Training Size for Pronunciation Adaptation in TTS, Proceedings of the International Conference on Statistical Language and Speech Processing, p.2016 ,
DOI : 10.21437/Interspeech.2016-864
URL : https://hal.archives-ouvertes.fr/hal-01338853
Adaptation de la prononciation pour la synthèse de la parole spontanée en utilisant des informations linguistiques ,
Ajout automatique de disfluences pour la synthèse de la parole spontanée : formalisation et preuve de concept ,
Pronunciation variants across system configuration, language and speaking style, Speech Communication, vol.29, issue.2-4, 1999. ,
DOI : 10.1016/S0167-6393(99)00032-1
URL : ftp://tlp.limsi.fr/public/spc99pron.ps.Z
Pronunciation variants in french: schwa & liaison, International Congress of Phonetic Sciences, 1999. ,
Investigating syllabic structures and their variation in spontaneous French, Speech Communication, vol.46, issue.2, 2005. ,
DOI : 10.1016/j.specom.2005.03.006
Filled Pauses in Speech Synthesis: Towards Conversational Speech, Proceedings of Text, Speech and Dialogue (TSD), 2007. ,
DOI : 10.1007/978-3-540-74628-7_47
URL : http://gps-tsc.upc.es/veu/research/pubs/download/Ade_Fil_07.pdf
On the generation of synthetic disfluent speech: local prosodic modifications caused by the insertion of editing terms, Proceedings of Annual Conference of the International Speech Communication Association (Interspeech), 2008. ,
Production of filled pauses in concatenative speech synthesis based on the underlying fluent sentence, Speech Communication, vol.54, issue.3, 2012. ,
DOI : 10.1016/j.specom.2011.10.010
Prediction and realisation of conversational characteristics by utilising spontaneous speech for unit selection, Speech Prosody, 2010. ,
Language accent classification in american english, Speech Communication, vol.18, 1996. ,
Introducing Phonetic Science, 2005. ,
DOI : 10.1017/CBO9780511808852
Vocal Expression and Perception of Emotion, Current Directions in Psychological Science, vol.15, issue.2, 1999. ,
DOI : 10.1007/BF00995674
Modeling pronunciation variation in conversational speech using prosody, Proceedings of ISCA Workshop on Pronunciation Modeling and Lexicon Adaptation for Spoken Language Technology (ITRW), 2002. ,
Effects of disfluencies, predictability, and utterance position on word form variation in English conversation, The Journal of the Acoustical Society of America, vol.113, issue.2, 2003. ,
DOI : 10.1121/1.1534836
Predictability effects on durations of content and function words in conversational English, Journal of Memory and Language, vol.60, issue.1, 2009. ,
DOI : 10.1016/j.jml.2008.06.003
Transformation of expressivity in speech, Linguistic Insights, 2009. ,
Using acoustic models to choose pronunciation variations for synthetic voices, Proceedings of Annual Conference of the International Speech Communication Association (Interspeech), 2003. ,
Neural networks for pattern recognition, 1995. ,
How listeners compensate for disfluencies in spontaneous speech, Journal of Memory and Language, vol.44, 2001. ,
Concise encyclopedia of semantics, 2010. ,
Signal Processing for Multimedia, 1999. ,
Automatic disruption classification at JET: comparison of different pattern recognition techniques, Nuclear Fusion, vol.46, issue.7, 2006. ,
DOI : 10.1088/0029-5515/46/7/002
Modeling pronunciation variation using artificial neural networks for English spontaneous speech, Proceedings of Annual Conference of the International Speech Communication Association (Interspeech), 2004. ,
How to compare tts systems: A new subjective evaluation methodology focused on differences, Proceedings of Annual Conference of the International Speech Communication Association (Interspeech), 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01199082
Syntactic Structures. Bod Third Party Titles, 2002. ,
Using Language, 1996. ,
Speaking in time, Speech Communication, vol.36, 2002. ,
Using uh and um in spontaneous speaking, Cognition, vol.84, 2002. ,
Practical Phonetics and Phonology: A Resource Book for Students, 2013. ,
Psychology Around Us, 2010. ,
Hesitation in speech can... um... help a listener understand, Proceedings of Meeting of the Cognitive Science Society, 2003. ,
Investigating automatic & human filled pause insertion for speech synthesis, Proceedings of Annual Conference of the International Speech Communication Association (Interspeech), 2014. ,
Pronunciation by analogy: Impact of implementational choices on performance, Language and Speech, vol.40, 1997. ,
PRONOUNCE: a program for pronunciation by analogy, Computer Speech & Language, vol.5, 1991. ,
Prosody and Language in Contact: L2 Acquisition, Attrition and Languages in Multilingual Situations, 2015. ,
DOI : 10.1007/978-3-662-45168-7
Modelling phonetic reduction in a corpus of spoken English using Random Forests and Mixed-Effects Regression, 2013. ,
On the use of machine learning in statistical parametric speech synthesis, Proceedings of Benelearn, 2008. ,
Silent and Non-Silent Pauses in Three Speech Styles, Language and Speech, vol.15, issue.1, 1982. ,
DOI : 10.1044/jshr.1501.49
Coarticulation and connected speech processes. The handbook of phonetic sciences, 1997. ,
DOI : 10.1002/9781444317251.ch9
Coarticulation and connected speech processes. The handbook of phonetic sciences, 1997. ,
DOI : 10.1002/9781444317251.ch9
Effects of speaking rate and word frequency on conversational pronunciations, Modeling Pronunciation Variation for Automatic Speech Recognition, 1998. ,
Effects of speaking rate and word frequency on pronunciations in convertional speech, Speech Communication, vol.29, 1999. ,
Multi-level decision trees for static and dynamic pronunciation models, Proceedings of the European Conference on Speech Communication and Technology (Eurospeech), 1999. ,
Dynamic pronunciation models for automatic speech recognition, 1999. ,
Talkers' signaling of " new " and " old " words in speech and listeners' perception and use of the distinction, Journal of Memory and Language, vol.26, 1987. ,
The effects of false starts and repetitions on the processing of subsequent words in spontaneous speech, Journal of Memory and Language, vol.34, 1995. ,
Listeners' uses of um and uh in speech comprehension, Memory & cognition, vol.29, 2001. ,
Discourse markers in spontaneous speech: Oh what a difference an oh makes, Journal of Memory and Language, vol.40, 1999. ,
Basic meanings of you know and i mean, Journal of Pragmatics, vol.34, 2002. ,
International encyclopedia of linguistics, 2003. ,
Automatic generation of multiple pronunciations based on neural networks, Speech Communication, vol.27, issue.1, 1999. ,
DOI : 10.1016/S0167-6393(98)00066-1
Integrated recognition of words and prosodic phrase boundaries, Speech Communication, vol.36, issue.1-2, 2002. ,
DOI : 10.1016/S0167-6393(01)00027-9
Automatic speech recognition using articulatory features from subject-independent acoustic-to-articulatory inversion, The Journal of the Acoustical Society of America, vol.130, 2011. ,
Word juncture modeling using phonological rules for HMM-based continuous speech recognition, Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1990. ,
DOI : 10.1109/icassp.1990.115893
A Practical Handbook of Speech Coders, 2000. ,
DOI : 10.1201/9781420036824
A real-time filled pause detection system for spontaneous speech recognition, Proceedings of the European Conference on Speech Communication and Technology (Eurospeech), 1999. ,
Expressive speech synthesis: a review, International Journal of Speech Technology, vol.51, issue.4, 2013. ,
DOI : 10.1016/j.specom.2009.04.004
Speaking in shorthand ??? A syllable-centric perspective for understanding pronunciation variation, Speech Communication, vol.29, issue.2-4, 1999. ,
DOI : 10.1016/S0167-6393(99)00050-3
The relation between stress accent and pronunciation variation in spontaneous american english discourse, Proceedings of Speech Prosody, 2002. ,
Unit Selection Cost Function Exploration Using an A* Based Text-to-Speech System, Proceedings of Text, Speech and Dialogue (TSD), 2014. ,
DOI : 10.1007/978-3-319-10816-2_52
URL : https://hal.archives-ouvertes.fr/hal-01133321
An introduction to variable and feature selection, Journal of Machine Learning Research, vol.3, 2003. ,
Letter-to-sound for small-footprint multilingual tts engine, Proceedings of International Conference on Spoken Language Processing (ICSLP), 2004. ,
Coarticulation: Theory, Data and Techniques, 2006. ,
Speech production and speech modelling, 2012. ,
Speech repairs, intonational phrases, and discourse markers: modeling speakers' utterances in spoken dialogue, Computational Linguistics, vol.25, 1999. ,
Speech Prosody in Speech Synthesis: Modeling and generation of prosody for high quality and flexible speech synthesis, 2015. ,
DOI : 10.1007/978-3-662-45258-5
Evaluation of modules and tools for speech synthesis: the ecess framework, LREC, 2008. ,
Speech synthesis and recognition, 2001. ,
Automatic Disfluency Removal on Recognized Spontaneous Speech - Rapid Adaptation to Speaker Dependent Disfluencies, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005., 2005. ,
DOI : 10.1109/ICASSP.2005.1415277
URL : http://digbib.ubka.uni-karlsruhe.de/volltexte/documents/336203/
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development, 2001. ,
Hidden Markov models for speech recognition, 1990. ,
Morpho-syntactic postprocessing of n-best lists for improved french automatic speech recognition, Computer Speech & Language, vol.24, 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-00508471
A corpus-based speech synthesis system with emotion, Speech Communication, vol.40, issue.1-2, 2003. ,
DOI : 10.1016/S0167-6393(02)00081-X
Phonological reduction in swedish, Proceedings of ICPhS, 2003. ,
Grapheme-to-phoneme conversion and its application to transliteration, 2011. ,
What kind of pronunciation variation is hard for triphones to model?, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221), 2001. ,
DOI : 10.1109/ICASSP.2001.940897
Phonemic variability and confusability in pronunciation modeling for automatic speech recognition, 2013. ,
URL : https://hal.archives-ouvertes.fr/tel-00843589
Discriminative training of a phoneme confusion model for a dynamic lexicon in ASR, Proceedings of Annual Conference of the International Speech Communication Association (Interspeech), 2013. ,
Noise in HMM-Based Speech Synthesis Adaptation: Analysis, Evaluation Methods and Experiments, IEEE Journal of Selected Topics in Signal Processing, vol.8, issue.2, 2014. ,
DOI : 10.1109/JSTSP.2013.2278492
Automatic detection and removal of disfluencies from spontaneous speech, Proceedings of the Australasian International Conference on Speech Science and Technology (SST), 2010. ,
Designing very compact decision trees for grapheme-to-phoneme transcription, Proceedings of Annual Conference of the International Speech Communication Association (Interspeech), 2001. ,
A beginners' guide to statistical parametric speech synthesis. The Centre for Speech Technology Research, 2010. ,
An introduction to statistical parametric speech synthesis, Sadhana, vol.21, issue.1, 2011. ,
DOI : 10.1016/j.csl.2006.01.002
The Blizzard Challenge 2012, Proceedings of Blizzard Challenge 2012 Workshop, 2012. ,
Robust speech recognition using articulatory information, 1999. ,
A categorized list of emotion definitions, with suggestions for a consensual definition, Motivation and Emotion, vol.5, 1981. ,
Simulation of Speaking Styles with Adapted Prosody, Proceedings of Text, Speech and Dialogue (TSD), 2001. ,
DOI : 10.1007/3-540-44805-5_37
Robust part-of-speech tagging using a hidden Markov model, Computer Speech & Language, vol.6, issue.3, 1992. ,
DOI : 10.1016/0885-2308(92)90019-Z
The contribution of intonation, segmental durations, and spectral features to the perception of a spontaneous and a read speaking style, Speech Communication, vol.22, 1997. ,
Conditional random fields: Probabilistic models for segmenting and labeling sequence data, Proceedings of International Conference on Machine Learning (ICML), 2001. ,
Monitoring and self-repair in speech, Cognition, p.14, 1983. ,
Some Effects of Semantic and Grammatical Context on the Production and Perception of Speech, Language and Speech, vol.33, issue.3, 1963. ,
DOI : 10.1121/1.1936662
Enriching speech recognition with automatic detection of sentence boundaries and disfluencies, Transactions on Audio, Speech, and Language Processing, 2006. ,
Articulatory feature-based pronunciation modeling, Computer Speech & Language, vol.36, 2016. ,
DOI : 10.1016/j.csl.2015.07.003
URL : https://doi.org/10.1016/j.csl.2015.07.003
Disturbances in the patient's speech as a function of anxiety. Eastern Psychological Association, Trends in content analysis, 1959. ,
A quantitative study of disfluencies in french broadcast interviews, Proceedings of Disfluency in Spontaneous Speech Workshop, 2005. ,
At the Syntax-pragmatics Interface: Verbal Underspecification and Concept Formation in Dynamic Syntax, 2002. ,
DOI : 10.1093/acprof:oso/9780199250639.001.0001
Individuation of postlexical phonology for speech synthesis, The Third ESCA/COCOSDA Workshop (ETRW) on Speech Synthesis, 1998. ,
Language Essentials for Teachers of Reading and Spelling, Sopris West Educational Services, 2004. ,
The Cognitive Representation of Speech, 1981. ,
The role of phonological rules in speech understanding research, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.23, issue.1, 1975. ,
DOI : 10.1109/TASSP.1975.1162639
Letter to sound rules for accented lexicon compression. arXiv preprint cmp-lg/9808010, 1998. ,
Style-Specific Phrasing in Speech Synthesis, 2013. ,
The basic grapheme to phoneme (G2P) rules for bodo language, International Journal, issue.2, 2013. ,
The ibm expressive text-to-speech synthesis system for american english, Transactions on Audio, Speech, and Language Processing, 2006. ,
Sub-Phonetic Modeling For Capturing Pronunciation Variations For Conversational Speech Synthesis, 2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings, 2006. ,
DOI : 10.1109/ICASSP.2006.1660155
Neural and adaptive systems: fundamentals through simulations, 2000. ,
Communication Acoustics: An Introduction to Speech, Audio and Psychoacoustics, 2015. ,
Induction of decision trees, Machine Learning, 1986. ,
DOI : 10.1037/13135-000
Predicting Prosody from Text for Text-to-Speech Synthesis, 2012. ,
DOI : 10.1007/978-1-4614-1338-7
Stochastic pronunciation modelling from hand-labelled phonetic corpora, Speech Communication, vol.29, issue.2-4, 1999. ,
DOI : 10.1016/S0167-6393(99)00037-0
URL : http://www.clsp.jhu.edu/people/byrne/ppubs/pmod.etrw98.pdf
The significance of pauses in spontaneous speech, Journal of Psycholinguistic Research, vol.2, 1973. ,
The communicative value of filled pauses in spontaneous speech, 1998. ,
Combining outputs from multiple machine translation systems, Proceedings of NAACL-HLT, 2007. ,
A circumplex model of affect., Journal of Personality and Social Psychology, vol.39, issue.6, 1980. ,
DOI : 10.1037/h0077714
URL : https://hal.archives-ouvertes.fr/hal-01086372
Understanding prosody's role in reading acquisition. Theory into Practice, 1991. ,
Expressive Speech Synthesis: Past, Present, and Possible Futures, Affective information processing, 2009. ,
DOI : 10.1007/978-1-84800-306-4_7
Multilingual Speech Processing, 2006. ,
Parallel networks that learn to pronounce english text, 1987. ,
Phonetic consequences of speech disfluency, 1999. ,
Preliminaries to a theory of speech disfluencies, 1994. ,
A Manual of English Phonetics and Phonology: Twelve Lessons with an Integrated Course in Phonetic Transcription, 2011. ,
Statistical language modeling for speech disfluencies, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, 1996. ,
DOI : 10.1109/ICASSP.1996.541118
Automatic detection of sentence boundaries and disfluencies based on recognized words, Proceedings of International Conference on Spoken Language Processing (ICSLP), 1998. ,
Srilm at sixteen: Update and outlook, Proceedings of Automatic Speech Recognition and Understanding Workshop (ASRU), 2011. ,
Srilm-an extensible language modeling toolkit, Proceedings of Annual Conference of the International Speech Communication Association (Interspeech ), 2002. ,
Modeling pronunciation variation for automatic speech recognition, Proceedings of the European Speech Communication Association (ESCA) Workshop, 1998. ,
An empirical text transformation method for spontaneous speech synthesizers, Proceedings of Annual Conference of the International Speech Communication Association (Interspeech), 2003. ,
An introduction to conditional random fields for relational learning. Introduction to statistical relational learning, 2006. ,
DOI : 10.1561/2200000013
URL : http://www.cs.umass.edu/%7Ecasutton/publications/crftut-fnt.pdf
Filled pauses as markers of discourse structure, Proceedings of International Conference on Spoken Language Processing (ICSLP), 1996. ,
Building multiple pronunciation models for novel words using exploratory computational phonology, Proceedings of the European Conference on Speech Communication and Technology (Eurospeech), 1995. ,
Hidden markov models for grapheme to phoneme conversion, Proceedings of Annual Conference of the International Speech Communication Association (Interspeech), 2005. ,
Text-to-speech synthesis, 2009. ,
DOI : 10.1017/CBO9780511816338
A latticebased approach to automatic filled pause insertion, Proceedings of the Disfluency in Spontaneous Speech (DiSS) Workshop, 2015. ,
Grammar, prosody and speech disfluencies in spoken dialogues. Unpublished doctoral dissertation, 1999. ,
Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes, Journal of Clinical Epidemiology, vol.49, 1996. ,
Wavenet: A generative model for raw audio, Proceedings of ISCA Speech Synthesis Workshop ,
Hybrid statistical pronunciation models designed to be trained by a medium-size corpus, Computer Speech & Language, vol.23, issue.1, 2009. ,
DOI : 10.1016/j.csl.2008.02.001
The cmu pronouncing dictionary, 1998. ,
Automatic labeling of prosodic patterns, Transactions on Speech and Audio Processing, 1994. ,
Acoustic Modeling of Speaking Styles and Emotional Expressions in HMM-Based Speech Synthesis, IEICE Transactions on Information and Systems, vol.88, issue.3, 2005. ,
DOI : 10.1093/ietisy/e88-d.3.502
Noisy training for deep neural networks in speech recognition, Speech, and Music Processing, 2015. ,
Pronunciation variations of spanish-accented english spoken by young children, Proceedings of Annual Conference of the International Speech Communication Association (Interspeech ), 2005. ,
Audio Coding: Theory and Applications, 2010. ,
DOI : 10.1007/978-1-4419-1754-6
The HMM-based speech synthesis system (HTS) version 2.0, Proceedings of SSW, 2007. ,
Statistical parametric speech synthesis, Speech Communication, vol.51, issue.11, 2009. ,
DOI : 10.1016/j.specom.2009.04.004
URL : https://hal.archives-ouvertes.fr/hal-00746106
Statistical parametric speech synthesis using deep neural networks, Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013. List of Figures 1.1 A diagram of the vocal organs (articulators) (source: [Benesty et al, p.19, 2007. ,