Y. Bengio, Learning Deep Architectures for AI, Machine Learning, pp.1-127, 2009.
DOI : 10.1561/2200000006

G. Hinton, L. Deng, D. Yu, E. George, A. Dahl et al., Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups, IEEE Signal Processing Magazine, vol.29, issue.6, pp.82-97, 2012.
DOI : 10.1109/MSP.2012.2205597

G. Abdel-rahman-mohamed, G. Dahl, and . Hinton, Acoustic Modeling Using Deep Belief Networks, IEEE Transactions on Audio, Speech, and Language Processing, vol.20, issue.1, pp.14-22, 2012.
DOI : 10.1109/TASL.2011.2109382

F. Seide, G. Li, X. Chen, and D. Yu, Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, pp.24-29, 2011.
DOI : 10.1109/ASRU.2011.6163899

B. Schölkopf and A. Smola, Learning with kernels, 2002.

L. Deng, G. Tür, X. He, and D. Z. Hakkani-tür, Use of Kernel Deep Convex Networks and End-toend Learning for Spoken Language Understanding, 2012 IEEE Spoken Language Technology Workshop (SLT), pp.210-215, 2012.

C. Cheng and B. Kingsbury, Arccosine kernels: Acoustic modeling with infinite neural networks, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.5200-5203, 2011.
DOI : 10.1109/ICASSP.2011.5947529

A. Rahimi and B. Recht, Random Features for Largescale Kernel Machines, Advances in Neural Information Processing Systems, pp.1177-1184, 2007.

A. Smola, Personal communication, 2014.

D. Decoste and B. Schölkopf, Training Invariant Support Vector Machines, Machine Learning, vol.46, issue.1/3, pp.161-190, 2002.
DOI : 10.1023/A:1012454411458

C. John and . Platt, Fast Training of Support Vector Machines using Sequential Minimal Optimization, Advances in Kernel Methods -Support Vector Learning, 1998.

I. W. Tsang, J. T. Kwok, and P. Cheung, Core Vector Machines: Fast SVM Training on Very Large Data Sets, Journal of Machine Learning Research, vol.6, pp.363-392, 2005.

L. Kenneth and . Clarkson, Coresets, Sparse Greedy Approximation , and the Frank-Wolfe Algorithm, ACM Trans. Algorithms, vol.6, issue.63, pp.1-6330, 2010.

S. Sonnenburg and V. Franc, COFFIN: A Computational Framework for Linear SVMs, Proc. of the 27th Intl. Conf. on Mach. Learn. (ICML), pp.999-1006, 2010.

A. Rahimi and B. Recht, Weighted Sums of Random Kitchen Sinks: Replacing Minimization with Randomization in Learning, Advances in Neural Information Processing Systems, pp.1313-1320, 2008.

P. Kar and H. Karnick, Random Feature Maps for Dot Product Kernels, Proc. of the 29th Intl. Conf. on Mach. Learn. (ICML), 2012.

R. Hamid, Y. Xiao, A. Gittens, and D. Decoste, Compact Random Feature Maps, Proc. of the 31th Intl. Conf. on Mach. Learn. (ICML), pp.19-27, 2014.

Q. Viet-le, T. Sarlós, and A. Smola, Fastfood: Approximating Kernel Expansions in Loglinear Time, Proc. of the 30th Int. Conf. on Mach. Learn. (ICML), 2013.

A. Vedaldi and A. Zisserman, Efficient Additive Kernels via Explicit Feature Maps, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, issue.3, pp.480-492, 2012.
DOI : 10.1109/TPAMI.2011.153

Z. Lu, A. May, K. Liu, A. Bagheri-garakani, D. Guo et al., How to Scale Up Kernel Methods to Be As Good As Deep Neural Nets, 2014.

N. Tara, B. Sainath, V. Kingsbury, E. Sindhwani, B. Arisoy et al., Low-rank Matrix Factorization for Deep Neural Network Training with Highdimensional Output Targets, Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pp.6655-6659, 2013.

B. Kingsbury, Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.3761-3764, 2009.
DOI : 10.1109/ICASSP.2009.4960445

N. Tara, B. Sainath, B. Kingsbury, P. Ramabhadran, P. Fousek et al., Making Deep Belief Networks Effective for Large Vocabulary Continuous Speech Recognition, Automatic Speech Recognition and Understanding (ASRU), pp.30-35, 2011.

F. Seide, G. Li, and D. Yu, Conversational Speech Transcription Using Context-Dependent Deep Neural Networks, Proc. of Interspeech, pp.437-440, 2011.

G. E. Hinton, S. Osindero, and Y. Teh, A Fast Learning Algorithm for Deep Belief Nets, Neural Computation, vol.18, issue.7, pp.1527-1554, 2006.
DOI : 10.1162/jmlr.2003.4.7-8.1235