Convolutional Neural Networks for Speech Recognition, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.22, issue.10, pp.1533-1545, 2014. ,
DOI : 10.1109/TASLP.2014.2339736
Do deep nets really need to be deep?, Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, pp.2654-2662, 2014. ,
Maximum mutual information estimation of hidden Markov model parameters for speech recognition, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing, pp.49-52, 1986. ,
DOI : 10.1109/ICASSP.1986.1169179
Localized Rademacher complexities, Proceedings of the 15th Annual Conference on Computational Learning Theory, COLT '02, pp.44-58, 2002. ,
On the Complexity of Neural Network Classifiers: A Comparison Between Shallow and Deep Architectures, IEEE Transactions on Neural Networks and Learning Systems, vol.25, issue.8, pp.1553-1565, 2014. ,
DOI : 10.1109/TNNLS.2013.2293637
Arccosine kernels: Acoustic modeling with infinite neural networks, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.5200-5203, 2011. ,
DOI : 10.1109/ICASSP.2011.5947529
Coresets, Sparse Greedy Approximation, and the Frank-Wolfe Algorithm, ACM Trans. Algorithms, vol.663, issue.4, pp.1-6330, 2010. ,
Approximation by superpositions of a sigmoidal function, MCSS, vol.2, issue.4, pp.303-314, 1989. ,
Context-dependent pre-trained deep neural networks for largevocabulary speech recognition. Audio, Speech, and Language Processing, IEEE Transactions on, vol.20, issue.1, pp.30-42, 2012. ,
DOI : 10.1109/tasl.2011.2134090
Scalable kernel methods via doubly stochastic gradients, Zoubin Ghahramani ,
Training Invariant Support Vector Machines, Machine Learning, vol.46, issue.1/3, pp.161-190, 2002. ,
DOI : 10.1023/A:1012454411458
Use of kernel deep convex networks and end-to-end learning for spoken language understanding, 2012 IEEE Spoken Language Technology Workshop (SLT), pp.210-215, 2012. ,
DOI : 10.1109/SLT.2012.6424224
Efficient online and batch learning using forward backward splitting, Journal of Machine Learning Research, vol.10, pp.2899-2934, 2009. ,
NIST Rich Transcription evaluation data. https://catalog.ldc.upenn, Linguistic Data Consortium Catalog No. LDC2007S10, 2003. ,
The Application of Hidden Markov Models in Speech Recognition, Foundations and Trends?? in Signal Processing, vol.1, issue.3, pp.195-304, 2007. ,
DOI : 10.1561/2000000004
Maximum likelihood linear transformations for HMM-based speech recognition, Computer Speech & Language, vol.12, issue.2, pp.75-98, 1998. ,
DOI : 10.1006/csla.1998.0043
URL : http://svr-www.eng.cam.ac.uk/~mjfg/lintran_CSL.ps.gz
DARPA TIMIT acoustic phonetic continuous speech corpus CDROM, 1993. ,
Hypothesis spaces for minimum Bayes risk training in large vocabulary speech recognition, INTERSPEECH 2006 -ICSLP, Ninth International Conference on Spoken Language Processing, 2006. ,
Understanding the difficulty of training deep feedforward neural networks, Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2010, pp.249-256, 2010. ,
Compact random feature maps, Proceedings of the 31th International Conference on Machine Learning, ICML 2014, pp.21-26, 2014. ,
Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups, IEEE Signal Processing Magazine, vol.29, issue.6, pp.2982-97, 2012. ,
DOI : 10.1109/MSP.2012.2205597
A Fast Learning Algorithm for Deep Belief Nets, Neural Computation, vol.18, issue.7, pp.1527-1554, 2006. ,
DOI : 10.1162/jmlr.2003.4.7-8.1235
URL : http://www.cs.berkeley.edu/~ywteh/research/ebm/nc2006.pdf
Kernel methods match deep neural networks on TIMIT, IEEE International Conference on Acoustics, Speech and Signal Processing, pp.205-209, 2014. ,
Batch normalization: Accelerating deep network training by reducing internal covariate shift, Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, pp.6-11, 2015. ,
A simple proof of Stirling's formula for the gamma function, The Mathematical Gazette, vol.115, issue.544, pp.68-74, 2015. ,
DOI : 10.2307/2323256
A novel loss function for the overall risk criterion based discriminative training of HMM models, Sixth International Conference on Spoken Language Processing, pp.887-890, 2000. ,
Random feature maps for dot product kernels, Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2012, pp.583-591, 2012. ,
Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.3761-3764, 2009. ,
DOI : 10.1109/ICASSP.2009.4960445
Fastfood ? approximating kernel expansions in loglinear time, Proceedings of the 30th International Conference on Machine Learning, ICML 2013, pp.16-21, 2013. ,
A stochastic gradient method with an exponential convergence rate for finite training sets, Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting, pp.2672-2680, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00674995
A comparison between deep neural nets and kernel acoustic models for speech recognition, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.5070-5074, 2016. ,
DOI : 10.1109/ICASSP.2016.7472643
URL : https://hal.archives-ouvertes.fr/hal-01329772
Compact kernel models for acoustic modeling via random feature selection, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.2424-2428, 2016. ,
DOI : 10.1109/ICASSP.2016.7472112
Acoustic Modeling Using Deep Belief Networks, IEEE Transactions on Audio, Speech, and Language Processing, vol.20, issue.1, pp.14-22, 2012. ,
DOI : 10.1109/TASL.2011.2109382
On the number of linear regions of deep neural networks, Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, pp.2924-2932, 2014. ,
Generalization and parameter estimation in feedforward nets: Some experiments, Advances in Neural Information Processing Systems 2, 1990. ,
Spherical random features for polynomial kernels, Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, pp.1846-1854, 2015. ,
Fast Training of Support Vector Machines using Sequential Minimal Optimization, Advances in Kernel Methods -Support Vector Learning, 1998. ,
Minimum phone error and i-smoothing for improved discriminative training, Acoustics, Speech, and Signal Processing (ICASSP) IEEE International Conference on, pp.105-108, 2002. ,
DOI : 10.1109/icassp.2002.1005687
URL : https://wiki.inf.ed.ac.uk/twiki/pub/CSTR/ListenSemester1_2007_8/povey_mpe.pdf
Boosted MMI for model and feature-space discriminative training, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.4057-4060, 2008. ,
DOI : 10.1109/ICASSP.2008.4518545
Evaluation of Proposed Modifications to MPE for Large Scale Discriminative Training, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '07, pp.321-324, 2007. ,
DOI : 10.1109/ICASSP.2007.366914
Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI, Interspeech 2016, 2016. ,
DOI : 10.21437/Interspeech.2016-595
Random features for large-scale kernel machines, Advances in Neural Information Processing Systems Proceedings of the Twenty-First Annual Conference on Neural Information Processing Systems, pp.1177-1184, 2007. ,
Weighted sums of random kitchen sinks: Replacing minimization with randomization in learning, Advances in Neural Information Processing Systems Proceedings of the Twenty-Second Annual Conference on Neural Information Processing Systems, pp.1313-1320, 2008. ,
Low-rank Matrix Factorization for Deep Neural Network Training with High-dimensional Output Targets, Acoustics , Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pp.6655-6659, 2013. ,
Making Deep Belief Networks effective for large vocabulary continuous speech recognition, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, pp.30-35, 2011. ,
DOI : 10.1109/ASRU.2011.6163900
URL : http://www.cs.toronto.edu/%7Easamir/papers/asru11.pdf
Optimization techniques to improve training speed of deep neural networks for large speech tasks. Audio, Speech, and Language Processing, IEEE Transactions on, vol.21, issue.11, pp.2267-2276, 2013. ,
Long short-term memory recurrent neural network architectures for large scale acoustic modeling, INTERSPEECH 2014, 15th Annual Conference of the International Speech Communication Association, pp.338-342, 2014. ,
Learning with kernels, 2002. ,
Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, pp.24-29, 2011. ,
DOI : 10.1109/ASRU.2011.6163899
Conversational speech transcription using context-dependent deep neural networks, 12th Annual Conference of the International Speech Communication Association, pp.437-440, 2011. ,
The IBM Attila speech recognition toolkit, 2010 IEEE Spoken Language Technology Workshop, pp.97-102, 2010. ,
DOI : 10.1109/SLT.2010.5700829
COFFIN: A computational framework for linear svms, Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp.999-1006, 2010. ,
Dropout: a simple way to prevent neural networks from overfitting, Journal of Machine Learning Research, vol.15, issue.1, pp.1929-1958, 2014. ,
Sparseness of support vector machines?some asymptotically sharp bounds, Advances in Neural Information Processing Systems 16, 2004. ,
On the importance of initialization and momentum in deep learning, Proceedings of the 30th International Conference on Machine Learning, ICML 2013, pp.16-21, 2013. ,
Core Vector Machines: Fast SVM Training on Very Large Data Sets, Journal of Machine Learning Research, vol.6, pp.363-392, 2005. ,
MMIE training of large vocabulary recognition systems, Speech Communication, vol.22, issue.4, pp.303-314, 1997. ,
DOI : 10.1016/S0167-6393(97)00029-0
Efficient Additive Kernels via Explicit Feature Maps, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, issue.3, pp.480-492, 2012. ,
DOI : 10.1109/TPAMI.2011.153
URL : http://eprints.pascal-network.org/archive/00006964/01/vedaldi10.pdf
Sequence-discriminative training of deep neural networks, 14th Annual Conference of the International Speech Communication Association, pp.2345-2349, 2013. ,
Using the Nyström method to speed up kernel machines, Advances in Neural Information Processing Systems 13, pp.682-688, 2001. ,
Achieving human parity in conversational speech recognition ,
DOI : 10.1109/taslp.2017.2756440
Sparse random feature algorithm as coordinate descent in Hilbert space, Advances in Neural Information Processing Systems 27, 2014. ,
Compact nonlinear maps and circulant extensions ,