F. Seide, G. Li, and D. Yu, Conversational speech transcription using context-dependent deep neural networks, Proc. Interspeech, pp.437-440, 2011.

G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed et al., Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups, IEEE Signal Processing Magazine, vol.29, issue.6, pp.82-97, 2012.
DOI : 10.1109/MSP.2012.2205597

K. Vesel´yvesel´y, A. Ghoshal, L. Burget, and D. Povey, Sequencediscriminative training of deep neural networks, Proc. Interspeech, pp.2345-2349, 2013.

Y. Wang and D. L. Wang, Towards Scaling Up Classification-Based Speech Separation, IEEE Transactions on Audio, Speech, and Language Processing, vol.21, issue.7, pp.1381-1390, 2013.
DOI : 10.1109/TASL.2013.2250961

P. Huang, M. Kim, M. Hasegawa-johnson, and P. Smaragdis, Deep learning for monaural speech separation, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.1562-1566, 2014.
DOI : 10.1109/ICASSP.2014.6853860

F. Weninger, J. Le-roux, J. R. Hershey, and B. Schuller, Discriminatively trained recurrent neural networks for singlechannel speech separation, Proc. GlobalSIP, 2014.

M. L. Seltzer, D. Yu, and Y. Wang, An investigation of noise robustness of deep neural networks, Proc. ICASSP, pp.7398-7402, 2013.

S. Renals and P. Swietojanski, Neural networks for distant speech recognition, 2014 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA), pp.172-176, 2014.
DOI : 10.1109/HSCMA.2014.6843274

C. Weng, D. Yu, M. L. Seltzer, and J. Droppo, Single-channel mixed speech recognition using deep neural networks, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.5632-5636, 2014.
DOI : 10.1109/ICASSP.2014.6854681

T. Mikolov, M. Karafiát, L. Burget, J. Cernock´ycernock´y, and S. Khudanpur, Recurrent neural network based language model, Proc. Interspeech, pp.1045-1048, 2010.

E. Arisoy, T. N. Sainath, B. Kingsbury, and B. Ramabhadran, Deep neural network language models, Proc. NAACL-HLT Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT, pp.20-28, 2012.

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, vol.86, issue.11, pp.2278-2324, 1998.
DOI : 10.1109/5.726791
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.138.1115

L. Bottou, Online algorithms and stochastic approximations, Online Learning and Neural Networks, pp.9-42, 1998.

S. Becker and Y. Lecun, Improving the convergence of backpropagation learning with second order methods, Tech. Rep, 1988.

Y. Lecun, L. Bottou, G. Orr, and K. Muller, Efficient backprop Neural Networks: Tricks of the trade, pp.546-546, 1998.

A. Bordes, L. Bottou, and P. Gallinari, SGD-QN: Careful quasi-newton stochastic gradient descent, Journal of Machine Learning Research, vol.10, pp.1737-1754, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00750911

J. Duchi, E. Hazan, Y. Singer, S. Wiesler, A. Richard et al., Adaptive subgradient methods for online learning and stochastic optimization Meannormalized stochastic gradient for large-scale deep learning, Conference on Learning Theory Proc. ICASSP, pp.180-184, 2010.

M. D. Zeiler, M. Ranzato, R. Monga, M. Mao, K. Yang et al., On rectified linear units for speech processing, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.3517-3521, 2013.
DOI : 10.1109/ICASSP.2013.6638312

P. Vincent, H. Larochelle, Y. Bengio, and P. Manzagol, Extracting and composing robust features with denoising autoencoders, Proceedings of the 25th international conference on Machine learning, ICML '08, 2008.
DOI : 10.1145/1390156.1390294

J. De-leeuw, Block-relaxation Algorithms in Statistics, Information Systems and Data Analysis, pp.308-325, 1994.
DOI : 10.1007/978-3-642-46808-7_28

W. J. Heiser, Convergent computing by iterative majorization: theory and applications in multidimensional data analysis, Recent Advances in Descriptive Multivariate Analysis, pp.157-189, 1995.

M. P. Becker, I. Yang, and K. Lange, EM algorithms without missing data, Statistical Methods in Medical Research, vol.5, issue.1, pp.38-54, 1997.
DOI : 10.1177/096228029700600104
URL : https://deepblue.lib.umich.edu/bitstream/2027.42/68889/2/10.1177_096228029700600104.pdf

K. Lange, D. R. Hunter, and I. Yang, Optimization transfer using surrogate objective functions (with discussion), J. Comput . Graphical Stat, vol.9, pp.1-20, 2000.

D. R. Hunter and K. Lange, A Tutorial on MM Algorithms, The American Statistician, vol.58, issue.1, pp.30-37, 2004.
DOI : 10.1198/0003130042836

N. Ono, K. Miyamoto, J. Le-roux, H. Kameoka, and S. Sagayama, Separation of a monaural audio signal into harmonic/percussive components by complementary diffusion on spectrogram, Proc. EUSIPCO, 2008.

N. Ono, Stable and fast update rules for independent vector analysis based on auxiliary function technique, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp.189-192, 2011.
DOI : 10.1109/ASPAA.2011.6082320

D. Böhning and B. G. Lindsay, Monotonicity of quadratic-approximation algorithms, Annals of the Institute of Statistical Mathematics, vol.11, issue.4, pp.641-663, 1988.
DOI : 10.1007/BF00049423

J. De-leeuw and K. Lange, Sharp quadratic majorization in one dimension, Computational Statistics & Data Analysis, vol.53, issue.7, pp.2471-2484, 2009.
DOI : 10.1016/j.csda.2009.01.002