Finding approximate local minima faster than gradient descent, STOC, 2017. ,
DOI : 10.1145/3055399.3055464
URL : http://arxiv.org/pdf/1611.01146
Deep Speech 2 : End-to-end speech recognition in English and Mandarin, ICML, 2016. ,
Efficient approaches for escaping higher order saddle points in non-convex optimization, COLT, 2016. ,
Globally normalized transition-based neural networks, ACL, 2016. ,
DOI : 10.18653/v1/p16-1231
URL : https://doi.org/10.18653/v1/p16-1231
A closer look at memorization in deep networks, ICML, 2017. ,
Do deep nets really need to be deep, NIPS, 2014. ,
Maximum mutual information estimation of hidden Markov model parameters for speech recognition, ICASSP, 1986. ,
For valid generalization the size of the weights is more important than the size of the network, NIPS, 1996. ,
Scaling learning algorithms towards AI. Large-Scale Kernel Machines, vol.34, pp.1-41, 2007. ,
On the complexity of neural network classifiers: A comparison between shallow and deep architectures, IEEE Trans. Neural Netw. Learning Syst, vol.25, issue.8, pp.1553-1565, 2014. ,
Large-Scale Kernel Machines, 2007. ,
Listen, attend and spell: A neural network for large vocabulary conversational speech recognition, ICASSP, pp.4960-4964, 2016. ,
Efficient one-vs-one kernel ridge regression for speech recognition, ICASSP, 2016. ,
State-of-the-art speech recognition with sequence-to-sequence models, ICASSP, 2018. ,
Gérard Ben Arous, and Yann LeCun. The loss surfaces of multilayer networks, AISTATS, 2015. ,
Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm, ACM Trans. Algorithms, vol.6, issue.4, 2010. ,
Approximation by superpositions of a sigmoidal function, MCSS, vol.2, issue.4, pp.303-314, 1989. ,
Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition, IEEE Trans. Audio, Speech & Language Processing, vol.20, issue.1, pp.30-42, 2012. ,
DOI : 10.1109/tasl.2011.2134090
URL : http://www.cs.toronto.edu/%7Egdahl/papers/DRAFT_DBN4LVCSR-TransASLP.pdf
Scalable kernel methods via doubly stochastic gradients, NIPS, 2014. ,
Identifying and attacking the saddle point problem in high-dimensional non-convex optimization, NIPS, 2014. ,
Training invariant support vector machines, Machine Learning, vol.46, pp.161-190, 2002. ,
Frontend factor analysis for speaker verification, IEEE Trans. Audio, Speech & Language Processing, vol.19, issue.4, pp.788-798, 2011. ,
Efficient online and batch learning using forward backward splitting, Journal of Machine Learning Research, vol.10, pp.2899-2934, 2009. ,
NIST Rich Transcription evaluation data. Linguistic Data Consortium, 2003. ,
Maximum likelihood linear transformations for HMM-based speech recognition, Computer Speech & Language, vol.12, issue.2, pp.75-98, 1998. ,
The application of hidden Markov models in speech recognition, Foundations and Trends in Signal Processing, vol.1, issue.3, pp.195-304, 2007. ,
TIMIT acoustic phonetic continuous speech corpus. Linguistic Data Consortium, 1993. ,
Hypothesis spaces for minimum Bayes risk training in large vocabulary speech recognition, INTERSPEECH, 2006. ,
Understanding the difficulty of training deep feedforward neural networks, AISTATS, 2010. ,
Deep Learning. Adaptive computation and machine learning, 2016. ,
Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks, 2006. ,
Compact random feature maps, ICML, 2014. ,
Learning both weights and connections for efficient neural network, NIPS, 2015. ,
Nonparametric and semiparametric models, 2004. ,
Deep residual learning for image recognition, CVPR, 2016. ,
DOI : 10.1109/cvpr.2016.90
URL : http://arxiv.org/pdf/1512.03385
Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, IEEE Signal Processing Magazine, p.29, 2012. ,
Multilayer feedforward networks are universal approximators, Neural Networks, vol.2, issue.5, pp.359-366, 1989. ,
DOI : 10.1016/0893-6080(89)90020-8
, Vikas Sindhwani, and Bhuvana Ramabhadran. Kernel methods match deep neural networks on TIMIT, 2014.
A simple proof of Stirling's formula for the gamma function. The Mathematical Gazette, vol.99, pp.68-74, 2015. ,
A novel loss function for the overall risk criterion based discriminative training of HMM models, INTERSPEECH, 2000. ,
Random feature maps for dot product kernels, AISTATS, 2012. ,
Lattice-based optimization of sequence classification criteria for neuralnetwork acoustic modeling, ICASSP, 2009. ,
A high-performance Cantonese keyword search system, ICASSP, 2013. ,
Imagenet classification with deep convolutional neural networks, NIPS, 2012. ,
DOI : 10.1145/3065386
URL : http://dl.acm.org/ft_gateway.cfm?id=3065386&type=pdf
Fastfood -computing Hilbert space expansions in loglinear time, ICML, 2013. ,
A comparison between deep neural nets and kernel acoustic models for speech recognition, ICASSP, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01329772
Compact kernel models for acoustic modeling via random feature selection, ICASSP, 2016. ,
Universal kernels, Journal of Machine Learning Research, vol.7, pp.2651-2667, 2006. ,
Recurrent neural network based language model, INTERSPEECH, 2010. ,
Efficient estimation of word representations in vector space, ICLR Workshop, 2013. ,
Acoustic modeling using deep belief networks, IEEE Trans. Audio, Speech & Language Processing, vol.20, issue.1, pp.14-22, 2012. ,
On the number of linear regions of deep neural networks, NIPS, 2014. ,
Generalization and parameter estimation in feedforward nets: Some experiments, NIPS, 1990. ,
In search of the real inductive bias: On the role of implicit regularization in deep learning, ICLR (Workshop), 2015. ,
Geometry of neural network loss surfaces via random matrix theory, ICML, 2017. ,
Spherical random features for polynomial kernels, NIPS, 2015. ,
Fast Training of Support Vector Machines using Sequential Minimal Optimization, Advances in Kernel Methods -Support Vector Learning, 1998. ,
Evaluation of proposed modifications to MPE for large scale discriminative training, ICASSP, 2007. ,
Minimum phone error and I-smoothing for improved discriminative training, ICASSP, 2002. ,
Boosted MMI for model and feature-space discriminative training, ICASSP, 2008. ,
Purely sequence-trained neural networks for ASR based on lattice-free MMI, INTERSPEECH, 2016. ,
Random features for large-scale kernel machines, NIPS, 2007. ,
Petr Fousek, Petr Novák, and Abdel-rahman Mohamed. Making deep belief networks effective for large vocabulary continuous speech recognition, ASRU, 2011. ,
Ebru Arisoy, and Bhuvana Ramabhadran. Low-rank matrix factorization for deep neural network training with highdimensional output targets, ICASSP, 2013. ,
Optimization techniques to improve training speed of deep neural networks for large speech tasks, IEEE Trans. Audio, Speech & Language Processing, vol.21, issue.11, pp.2267-2276, 2013. ,
Deep convolutional neural networks for LVCSR, ICASSP, 2013. ,
DOI : 10.1109/icassp.2013.6639347
Long short-term memory recurrent neural network architectures for large scale acoustic modeling, INTERSPEECH, 2014. ,
The IBM 2016 English conversational telephone speech recognition system, INTERSPEECH, 2016. ,
English conversational telephone speech recognition by humans and machines, INTERSPEECH, 2017. ,
DOI : 10.21437/interspeech.2017-405
URL : http://arxiv.org/pdf/1703.02136
Learning with kernels, 2002. ,
Feature engineering in context-dependent deep neural networks for conversational speech transcription, ASRU, 2011. ,
Conversational speech transcription using contextdependent deep neural networks, In INTERSPEECH, 2011. ,
Advances in very deep convolutional neural networks for LVCSR, INTERSPEECH, 2016. ,
Very deep convolutional networks for large-scale image recognition, ICLR, 2015. ,
The IBM Attila speech recognition toolkit, SLT, 2010. ,
Joint training of convolutional and non-convolutional neural networks, In ICASSP, 2014. ,
COFFIN: a computational framework for linear SVMs, ICML, 2010. ,
Sparseness of support vector machines-some asymptotically sharp bounds, NIPS, 2003. ,
Sparse connection and pruning in large dynamic artificial neural networks, EUROSPEECH, 1997. ,
LSTM neural networks for language modeling, INTERSPEECH, 2012. ,
DOI : 10.1109/taslp.2015.2400218
Sequence to sequence learning with neural networks, NIPS, 2014. ,
Core vector machines: Fast SVM training on very large data sets, Journal of Machine Learning Research, vol.6, pp.363-392, 2005. ,
MMIE training of large vocabulary recognition systems, Speech Communication, vol.22, issue.4, pp.303-314, 1997. ,
Training variance and performance evaluation of neural networks in speech, 2017. ,
Efficient additive kernels via explicit feature maps, IEEE Trans. Pattern Anal. Mach. Intell, vol.34, issue.3, pp.480-492, 2012. ,
DOI : 10.1109/tpami.2011.153
Sequence-discriminative training of deep neural networks, In INTERSPEECH, 2013. ,
Using the Nyström method to speed up kernel machines, NIPS, 2000. ,
Diverse neural network learns true target functions, AISTATS, 2017. ,
Toward human parity in conversational speech recognition, IEEE/ACM Trans. Audio, Speech & Language Processing, vol.25, issue.12, pp.2410-2423, 2017. ,
Restructuring of deep neural network acoustic models with singular value decomposition, INTERSPEECH, 2013. ,
, Deep fried convnets. In ICCV, 2015.
DOI : 10.1109/iccv.2015.173
Sparse random feature algorithm as coordinate descent in Hilbert space, NIPS, 2014. ,
, Compact nonlinear maps and circulant extensions, 2015.
Understanding deep learning requires rethinking generalization, 2017. ,