D. Bahdanau, K. Cho, and Y. Bengio, Neural machine translation by jointly learning to align and translate, ICLR, 2015.

D. Bahdanau, P. Brakel, K. Xu, A. Goyal, R. Lowe et al., An actor-critic algorithm for sequence prediction, ICLR, 2017.

M. Ballesteros, Y. Goldberg, C. Dyer, and N. A. Smith, Training with Exploration Improves a Greedy Stack LSTM Parser, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016.
DOI : 10.18653/v1/D16-1211

A. Beygelzimer, H. Daumé, J. Langford, and P. Mineiro, Learning Reductions That Really Work, Proceedings of the IEEE, pp.136-147, 2016.
DOI : 10.1109/JPROC.2015.2494118

K. Chang, A. Krishnamurthy, A. Agarwal, H. Daumé, J. Iii et al., Learning to search better than your teacher, ICML, 2015.

K. Cho, B. Van-merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares et al., Learning Phrase Representations using RNN Encoder???Decoder for Statistical Machine Translation, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014.
DOI : 10.3115/v1/D14-1179

URL : https://hal.archives-ouvertes.fr/hal-01433235

R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu et al., Natural language processing (almost) from scratch, Journal of Machine Learning Research, vol.12, pp.2493-2537, 2011.

H. Daumé, D. Iii, and . Marcu, Learning as search optimization, Proceedings of the 22nd international conference on Machine learning , ICML '05, 2005.
DOI : 10.1145/1102351.1102373

H. Daumé, I. , J. Langford, and D. Marcu, Search-based structured prediction, Machine Learning, 2009.
DOI : 10.1007/s10994-009-5106-x

Y. Golberg and J. Nivre, A dynamic oracle for arc-eager dependency parsing, 2012.

I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, 2016.

S. Hochreiter and J. Schmidhuber, Long Short-Term Memory, Neural Computation, vol.4, issue.8, 1997.
DOI : 10.1016/0893-6080(88)90007-X

S. Jean, K. Cho, R. Memisevic, and Y. Bengio, On Using Very Large Target Vocabulary for Neural Machine Translation, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2015.
DOI : 10.3115/v1/P15-1001

M. Luong, H. Pham, and C. D. Manning, Effective Approaches to Attention-based Neural Machine Translation, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015.
DOI : 10.18653/v1/D15-1166

URL : http://aclweb.org/anthology/D/D15/D15-1166.pdf

P. Pletscher, C. S. Ong, and J. M. Buhmann, Entropy and Margin Maximization for Structured Output Learning, ECML PKDD, 2010.
DOI : 10.1007/978-3-642-15939-8_6

URL : http://www.pletscher.org/papers/pletscher2010maxentmarg.pdf

M. Ranzato, S. Chopra, M. Auli, and W. Zaremba, Sequence level training with recurrent neural networks, 2016.

S. Rennie, E. Marcheret, Y. Mroueh, J. Ross, and V. Goel, Self-Critical Sequence Training for Image Captioning, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2017.131

S. Ross and J. A. , Reinforcement and imitation learning via interactive no-regret learning, 2014.

N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting, Journal of Machine Learning Research, vol.15, pp.1929-1958, 2014.

W. Sun, A. Venkatraman, G. J. Gordon, B. Boots, and J. A. , Deeply aggrevated: Differentiable imitation learning for sequential prediction, 2017.

I. Sutskever, O. Vinyals, and Q. V. Le, Sequence to sequence learning with neural networks, 2014.

B. Taskar, C. Guestrin, and D. Koller, Max-margin Markov networks, NIPS, 2003.

E. F. Tjong, K. Sang, and S. Buchholz, Introduction to the CoNLL-2000 shared task: Chunking, CoNLL, 2000.

I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun, Large margin methods for structured and interdependent output variables, JMLR, 2005.

O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, Show and tell: A neural image caption generator, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
DOI : 10.1109/CVPR.2015.7298935

S. Wiseman and A. M. Rush, Sequence-to-Sequence Learning as Beam-Search Optimization, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016.
DOI : 10.18653/v1/D16-1137

URL : http://arxiv.org/abs/1606.02960