. Bibliography and . Hamid, Exploring convolutional neural network structures and optimization techniques for speech recognition, Proc. INTERSPEECH, pp.3366-3370, 2013.

H. Atal, S. Bishnu, S. L. Atal, and . Hanauer, Speech Analysis and Synthesis by Linear Prediction of the Speech Wave, The Journal of the Acoustical Society of America, vol.50, issue.2B, pp.637-655, 1971.
DOI : 10.1121/1.1912679

. Bahl, Maximum mutual information estimation of hidden Markov model parameters for speech recognition, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing, pp.49-52, 1986.
DOI : 10.1109/ICASSP.1986.1169179

E. Baum, E. Leonard, J. A. Baum, and . Eagon, An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology, Bulletin of the American Mathematical Society, vol.73, issue.3, pp.360-363, 1967.
DOI : 10.1090/S0002-9904-1967-11751-8

P. Baum, E. Leonard, T. Baum, and . Petrie, Statistical inference for probabilistic functions of finite state Markov chains. The annals of mathematical statistics, pp.1554-1563, 1966.

. Beaufays, Unsupervised discovery and training of maximally dissimilar cluster models, Proc. INTERSPEECH, pp.66-69, 2010.

R. Jerome and . Bellegarda, Statistical language model adaptation: review and perspectives, Speech communication, vol.42, issue.1, pp.93-108, 2004.

. Benzeghiba, Automatic speech recognition and speech variability: A review, Speech Communication, vol.49, issue.10-11, pp.763-786, 2007.
DOI : 10.1016/j.specom.2007.02.006

URL : https://hal.archives-ouvertes.fr/inria-00616506

A. Jeff and . Bilmes, A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models, International Computer Science Institute, vol.4, issue.510, 1998.

A. Jeff and . Bilmes, Buried Markov models for speech recognition, Acoustics, Speech, and Signal Processing, vol.2, pp.713-716, 1999.

A. Jeffrey and . Bilmes, Graphical models and automatic speech recognition Mathematical foundations of speech and language processing, pp.191-245, 2004.

N. Bisani, H. Bisani, and . Ney, Joint-sequence models for grapheme-to-phoneme conversion, Speech Communication, vol.50, issue.5, pp.434-451, 2008.
DOI : 10.1016/j.specom.2008.01.002

URL : https://hal.archives-ouvertes.fr/hal-00499203

. Bibliography, M. Bourlard, N. Bourlard, S. Morgan-]-john, and . Bridle, Connectionist speech recognition: a hybrid approach Towards better understanding of the model implied by the use of dynamic features in HMMs, Proc. ICSLP, pp.725-728, 1994.

F. Burnett, C. Daniel, M. Burnett, and . Fanty, Rapid unsupervised adaptation to children's speech on a connected-digit task Neologos: an optimized database for the development of new speech processing algorithms, Proc. ICSLP Proc. INTERSPEECH, pp.1145-1148, 1996.

. Cieri, The Fisher corpus: a resource for the next generations of speech-to-text, Proc. LREC, pp.69-71, 2004.

. Cohen, Vocal tract normalization in speech recognition: Compensating for systematic speaker variability, The Journal of the Acoustical Society of America, vol.97, issue.5, pp.3246-3247, 1995.
DOI : 10.1121/1.411700

. Dahl, Context-dependent pre-trained deep neural networks for large vocabulary speech recognition. Audio, Speech, and Language Processing, IEEE Transactions on, vol.20, issue.1, pp.30-42, 2010.

M. Davis, P. Davis, and . Mermelstein, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. Acoustics, Speech and Signal Processing, IEEE Transactions on, vol.28, issue.4, pp.357-366, 1980.

. Davis, Automatic Recognition of Spoken Digits, The Journal of the Acoustical Society of America, vol.24, issue.6, p.637, 1952.
DOI : 10.1121/1.1906946

. Dehak, Front-end factor analysis for speaker verification. Audio, Speech, and Language Processing Improvements to the LIUM French ASR system based on CMU sphinx: what helps to significantly reduce the word error rate?, Proc. INTERSPEECH, pp.788-798, 2009.

. Dempster, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society. Series B (Methodological), pp.1-38, 1977.

A. Deng, M. Deng, and . Aksmanovic, Speech recognition using hidden Markov models with polynomial regression functions as nonstationary states. Speech and Audio Processing, IEEE Transactions on, vol.2, issue.4, pp.507-520, 1994.

L. Deng, L. Deng, X. Li, and L. Deng, Machine learning paradigms for speech recognition: An overview. Audio, Speech, and Language Processing Dynamic speech models: theory, algorithms, and applications, Synthesis Lectures on Speech and Audio Processing, pp.1-30, 2006.

H. Dennis, H. Klatt-dennis, and . Klatt, Review of the DARPA speech understanding project, JASA, vol.62, pp.1345-1366, 1977.

. Digalakis, A dynamical system approach to continuous speech recognition, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing, pp.289-292, 1991.
DOI : 10.1109/ICASSP.1991.150334

. Digalakis, Speaker adaptation using constrained estimation of Gaussian mixtures. Speech and Audio Processing, IEEE Transactions on, vol.3, issue.5, pp.357-366, 1995.

. Dolmazon, Organisation de la première campagne AUPELF pour l'évaluation des systèmes de dictée vocale, Proc. l'AUPELF- UREF, pp.13-18, 1997.

. Erman, The Hearsay-II Speech-Understanding System: Integrating Knowledge to Resolve Uncertainty, ACM Computing Surveys, vol.12, issue.2, pp.213-253, 1980.
DOI : 10.1145/356810.356816

G. Jonathon and . Fiscus, A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER), Proc. ASRU workshop, pp.347-354, 1997.

. Fukuda, Constructing ensembles of dissimilar acoustic models using hidden attributes of training data, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.4141-4144, 2012.
DOI : 10.1109/ICASSP.2012.6288830

S. Young, The application of hidden Markov models in speech recognition, Foundations and Trends in Signal Processin, vol.1, issue.3, pp.195-304, 2008.

. Galliano, The ESTER phase II evaluation campaign for the rich transcription of French broadcast news, Proc. INTERSPEECH, pp.1149-1152, 2005.

. Galliano, The ESTER 2 evaluation campaign for the rich transcription of French radio broadcasts, Proc. INTER- SPEECH, pp.2583-2586, 2009.

. Garofolo, DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1-1.1, p.27403, 1993.

L. Gauvain, . Jean-luc, C. Gauvain, and . Lee, MAP estimation of continuous density HMM, Proceedings of the workshop on Speech and Natural Language , HLT '91, pp.185-190, 1992.
DOI : 10.3115/1075527.1075568

L. Gauvain, C. Gauvain, and . Lee, Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. Speech and Audio Processing, IEEE Transactions on, vol.2, issue.2, pp.291-298, 1994.

C. Gillick, S. J. Gillick, and . Cox, Some statistical issues in the comparison of speech recognition algorithms, International Conference on Acoustics, Speech, and Signal Processing, pp.532-535, 1989.
DOI : 10.1109/ICASSP.1989.266481

N. Gish, K. Gish, and . Ng, A segmental speech model with applications to word spotting, IEEE International Conference on Acoustics Speech and Signal Processing, pp.447-450, 1993.
DOI : 10.1109/ICASSP.1993.319337

. Godfrey, SWITCH- BOARD: Telephone speech corpus for research and development Statistical trajectory models for phonetic recognition Yifan Gong. Stochastic trajectory modeling and sentence searching for continuous speech recognition, Proc. ICASSP Speech and Audio Processing, pp.517-52033, 1992.

J. Gorin, D. Gorin, and . Jouvet, Class-based speech recognition using a maximum dissimilarity criterion and a tolerance classification margin, 2012 IEEE Spoken Language Technology Workshop (SLT), pp.91-96
DOI : 10.1109/SLT.2012.6424203

URL : https://hal.archives-ouvertes.fr/hal-00753454

J. Gorin, D. Gorin, and . Jouvet, Efficient constrained parametrization of GMM with class-based mixture weights for automatic speech recognition, Proc. LTC-6th Language & Technologies Conference, pp.550-554, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00923202

J. Gorin, D. Gorin, and . Jouvet, Component structuring and trajectory modeling for speech recognition, Proc. INTERSPEECH
URL : https://hal.archives-ouvertes.fr/hal-01063653

J. Gorin, D. Gorin, and . Jouvet, Modélisation de trajectoires et de classes de locuteurs pour la reconnaissance de voix d'enfants et d'adultes, Proc. JEP, Le Mans, 2014.

J. Gorin, D. Gorin, and . Jouvet, Structured GMM Based on Unsupervised Clustering for Recognizing Adult and Child Speech, SLSP
DOI : 10.1007/978-3-319-11397-5_8

URL : https://hal.archives-ouvertes.fr/hal-01090472

. Gorin, Investigating stranded GMM for improving automatic speech recognition, 2014 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA), pp.192-196, 2014.
DOI : 10.1109/HSCMA.2014.6843278

URL : https://hal.archives-ouvertes.fr/hal-01003054

. Graff, The 1996 broadcast news speech and language-model corpus Aude Giraudel, and Olivier Galibert. The ETAPE corpus for the evaluation of speech-based TV content processing in the French language, Proc. DARPA SLT Proc. LREC, pp.11-14, 1997.

. Han, Trajectory clustering for solving the trajectory folding problem in automatic speech recognition. Audio, Speech, and Language Processing, IEEE Transactions on, vol.15, issue.4, pp.1425-1434, 2007.

G. Hazen, J. Timothy, J. R. Hazen, and . Glass, A comparison of novel techniques for instantaneous speaker adaptation, Proc. EUROSPEECH, pp.2047-2050, 1997.

. Hemphill, The ATIS spoken language systems pilot corpus] Hynek Hermansky. Tandem connectionist feature extraction for conventional HMM systems, Proc. DARPA speech and natural language workshop Proc. ICASSP, pp.96-101, 1990.

. Hinton, A fast learning algorithm for deep belief nets Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups, Neural computation Signal Processing Magazine IEEE, vol.18, issue.76, pp.1527-1554, 2006.

R. Holmes, J. Wendy, M. J. Holmes, and . Russell, Probabilistic-trajectory segmental HMMs, Computer Speech & Language, vol.13, issue.1, pp.3-37, 1999.
DOI : 10.1006/csla.1998.0048

. Huang, Analysis of speaker variability, Proc. INTERSPEECH, pp.1377-1380, 2001.

H. Hwang, X. Hwang, and . Huang, Shared-distribution hidden Markov models for speech recognition. Speech and Audio Processing, IEEE Transactions on, vol.1, issue.4, pp.414-420, 1993.

]. Hwang, Subphonetic acoustic modeling for speaker-independent continuous speech recognition, 1993.

. Illina, The automatic news transcription system: ANTS, Proc. INTERSPEECH, pp.377-380, 2004.
URL : https://hal.archives-ouvertes.fr/inria-00100043

. Illina, Grapheme-to-phoneme conversion using conditional random fields, Proc. INTERSPEECH, pp.2313-2316, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00614981

. Jelinek, Design of a linguistic statistical decoder for the recognition of continuous speech. Information Theory Fast sequential decoding algorithm using a stack, IEEE Transactions on IBM Journal of Research and Development, vol.21, issue.136, pp.250-256675, 1969.

]. Jelinek, SELF-ORGANIZED LANGUAGE MODELING FOR SPEECH RECOGNITION, Readings in speech recognition, pp.450-506, 1990.
DOI : 10.1016/B978-0-08-051584-7.50045-0

]. Jelinek, Statistical methods for speech recognition, 1997.

. Bibliography, . Jouvet, D. Fohr, D. Jouvet, and . Fohr, Analysis and Combination of Forward and Backward based Decoders for Improved Speech Transcription, Proc. TSD, pp.84-91, 2013.

. Jouvet, D. Fohr, D. Jouvet, and . Fohr, Combining forward-based and backward-based decoders for improved speech recognition performance, Proc. INTER- SPEECH, pp.652-656
URL : https://hal.archives-ouvertes.fr/hal-00834282

. Jouvet, D. Langlois, D. Jouvet, and . Langlois, A Machine Learning Based Approach for Vocabulary Selection for Speech Transcription, Proc. TSD, pp.60-67, 2013.
DOI : 10.1007/978-3-642-40585-3_9

URL : https://hal.archives-ouvertes.fr/hal-00834302

D. Jouvet and N. Vinuesa, Classification margin for improved class-based speech recognition performance, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.4285-4288, 2012.
DOI : 10.1109/ICASSP.2012.6288866

URL : https://hal.archives-ouvertes.fr/hal-00753345

. Jouvet, Evaluating grapheme-tophoneme converters in automatic speech recognition context, ICASSP, pp.4821-4824, 2012.
DOI : 10.1109/icassp.2012.6288998

URL : https://hal.inria.fr/hal-00753364/document

. Jouvet, Exploitation d'une marge de tolérance de classification pour améliorer l'apprentissage de modèles acoustiques de classes en reconnaissance de la parole, Proc. JEP-TALN-RECITAL, pp.763-770, 2012.

R. Juang, F. Juang, and L. R. Rabiner, Mixture autoregressive hidden Markov models for speech signals. Acoustics, Speech and Signal Processing, IEEE Transactions on, vol.33, issue.6, pp.1404-1413, 1985.

R. Juang, F. Juang, and L. R. Rabiner, Automatic speech recognition?A brief history of the technology development. Encyclopedia of Language and Linguistics, 2005.

. Juang, Minimum classification error rate methods for speech recognition. Speech and Audio Processing, IEEE Transactions on, vol.5, issue.3, pp.257-265, 1997.

M. Jurafsky, J. H. Jurafsky, and . Martinkajarekar, Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition Analysis of sources of variability in speech, Proc. EUROSPEECH, pp.343-346, 1999.

. Kim, Using VTLN for broadcast news transcription, Proc. ICSLP, pp.1953-1956, 2004.

F. Koller, N. Koller, and . Friedmankorkmazskiy, Probabilistic Graphical Models: Principles and Techniques Generalized mixture of HMMs for continuous speech recognition, Biing-Hwang Juang, and Frank Soong Proc. ICASSP, pp.1443-1446, 1997.

. Kuhn, Rapid speaker adaptation in eigenvoice space, IEEE Transactions on Speech and Audio Processing, vol.8, issue.6, pp.695-707, 2000.
DOI : 10.1109/89.876308

. Lammert, Statistical methods for estimation of direct and differential kinematics of the vocal tract, Speech Communication, vol.55, issue.1, pp.147-161, 2013.
DOI : 10.1016/j.specom.2012.08.001

. Lawson, Effect of foreign accent on speech recognition in the NATO n-4 corpus, Proc. EUROSPEECH, pp.1505-1508, 2003.

. Le, Speaker diarization using normalized cross likelihood ratio, Proc. DARPA, pp.1869-1872, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00163855

. Lee, An overview of the SPHINX speech recognition system. Acoustics, Speech and Signal Processing, IEEE Transactions on, vol.38, issue.1, pp.35-45, 1990.

. Lee, Julius?an open source real-time large vocabulary recognition engine, Proc. EUROSPEECH, pp.1691-1694, 2001.

]. Lee, Context-dependent phonetic hidden Markov models for speakerindependent continuous speech recognition. Acoustics, Speech and Signal Processing, IEEE Transactions on, vol.38, issue.4, pp.599-609, 1990.

]. , J. Leggetter, and P. C. Woodland, Speaker adaptation of HMMs using linear regression, 1994.

S. Liu, K. Liu, and . Sim, Implicit trajectory modelling using temporally varying weight regression for automatic speech recognition, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.4761-4764, 2012.
DOI : 10.1109/ICASSP.2012.6288983

S. Liu, K. Liu, and . Sim, An investigation of temporally varying weight regression for noise robust speech recognition, Proc. INTERSPEECH, pp.2963-2967

. Liu, Efficient cepstral normalization for robust speech recognition, Proceedings of the workshop on Human Language Technology , HLT '93, pp.69-74, 1993.
DOI : 10.3115/1075671.1075688

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.421.3299

. Liu, Trust region-based optimization for maximum mutual information estimation of HMMs in speech recognition. Audio, Speech, and Language Processing, IEEE Transactions on, vol.19, issue.8, pp.2474-2485, 2011.

T. Bruce and . Lowerre, The harpy speech recognition system, 1976.

. Mak, Improving Reference Speaker Weighting Adaptation by the Use of Maximum-Likelihood Reference Speakers, 2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings, 2006.
DOI : 10.1109/ICASSP.2006.1659999

. Mao, Automatic Training Set Segmentation for Multi-pass Speech Recognition, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005., pp.685-688, 2005.
DOI : 10.1109/ICASSP.2005.1415206

. Mcdermott, Discriminative training for large-vocabulary speech recognition using minimum classification error. Audio, Speech, and Language Processing, IEEE Transactions on, vol.15, issue.1, pp.203-223, 2007.

. Morgan, Pushing the envelope - aside [speech recognition, Proc. IEEE Proc. DARPA Workshop CSR Speech and Audio Processing, pp.81-881038, 1992.
DOI : 10.1109/MSP.2005.1511826

V. A. Ozerov, E. Ozerov, and . Vincentpitrelli, Using the FASST source separation toolbox for noise robust speech recognition [Panchapagesan and Alwan, 2009] Sankaran Panchapagesan and Abeer Alwan. Frequency warping for VTLN and speaker adaptation by linear transformation of standard MFCC. Computer speech and language PhoneBook: A phonetically-rich isolated-word telephone-speech database, Proc. ICASSP, pp.42-64, 1995.

B. Alan and . Poritz, Linear predictive hidden Markov models and the speech signal, Proc. ICASSP, pp.1291-1294, 1982.

C. Philip and . Woodland, Minimum phone error and I-smoothing for improved discriminative training, Proc. ICASSP, pages I?105, 2002.

Y. Povey, K. Daniel-povey, and . Yaopovey, A basis representation of constrained MLLR transforms for robust adaptation, Proc. ICASSP Proc. ASRU workshop Proc. ICASSP, pp.35-51, 1988.
DOI : 10.1016/j.csl.2011.04.002

R. Lawrence and . Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition [Ravishankar, 1996] Mosur Ravishankar. Efficient algorithms for speech recognition, Proc. IEEE, pp.257-286, 1989.

. Rybach, The RWTH aachen university open source speech recognition system, Proc. INTERSPEECH, pp.2111-2114, 2009.

. Sak, Ha?im Sak, Andrew Senior, and Françoise Beaufays. Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition. arXiv preprint, 2014.

. Saraclar, Pronunciation modeling by sharing Gaussian densities across phonetic models, Computer Speech & Language, vol.14, issue.2, pp.137-160, 2000.
DOI : 10.1006/csla.2000.0140

T. Schultz, A. Waibel-]-claude, and E. Shannon, Language-independent and language-adaptive acoustic modeling for speech recognition, Speech Communication, vol.35, issue.1-2, pp.31-5150, 1951.
DOI : 10.1016/S0167-6393(00)00094-7

G. Sim, Discriminative semi-parametric trajectory model for speech recognition, Computer Speech & Language, vol.21, issue.4, pp.669-687, 2007.
DOI : 10.1016/j.csl.2007.03.004

. Su, Speaker time-drifting adaptation using trajectory mixture hidden Markov models Analysis of acoustic-phonetic variations in fluent speech using TIMIT, Proc. ICASSP Proc. ICASSP, pp.709-712, 1995.

. Teng, Frédéric Bimbot, and Frédéric Soufflet. Speaker adaptation by variable reference model subspace and application to large vocabulary speech recognition, Proc. ICASSP, pp.4381-4384, 2009.

. Tran, Extension of uncertainty propagation to dynamic MFCCS for noise robust ASR, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.5544-5548, 2014.
DOI : 10.1109/ICASSP.2014.6854656

URL : https://hal.archives-ouvertes.fr/hal-00954654

. Vincent, The second ‘chime’ speech separation and recognition challenge: Datasets, tasks and baselines, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.69-74, 2013.
DOI : 10.1109/ICASSP.2013.6637622

K. Taras and . Vintsyuk, Speech discrimination by dynamic programming, Bibliography Kibernetika, vol.4, issue.1, pp.52-57, 1968.

. Vu, Multilingual Deep Neural Network based Acoustic Modeling For Rapid Language Adaptation Hisashi Wakita. Direct estimation of the vocal tract shape by inverse filtering of acoustic speech waveforms. Audio and Electroacoustics Cross likelihood ratio based speaker clustering using eigenvoice models Speaker normalization on conversational telephone speech [Wellekens, 1987] Christian Wellekens. Explicit time correlation in hidden Markov models for speech recognition Rapid speaker adaptation by reference model interpolation, ICASSP, 2014. [Wakita Proc. INTERSPEECH ICASSP Proc. ICASSP Proc. INTERSPEECH ISCA. [Woodland and Povey Woodland and Daniel Povey. Large scale discriminative training of hidden Markov models for speech recognition. Computer Speech & Language, pp.260-269417, 1967.

[. Estève, Frédéric Béchet , and Jérôme Farinas. The EPAC corpus: manual and automatic annotations of conversational speech in French broadcast news, Proc. LREC, pp.1686-1689, 2010.

Y. Guoli and B. Mak, Speaker-ensemble hidden Markov modeling for automatic speech recognition Tree-based state tying for high accuracy acoustic modelling, Proc. ISCSLP Proc. HLT The HTK book version 3.4, pp.6-10, 1994.

. Zhang, An i-vector based approach to training data clustering for improved speech recognition, Proc. INTERSPEECH, pp.789-792, 2011.