D. Bahdanau, P. Brakel, K. Xu, A. Goyal, R. Lowe et al., An actor-critic algorithm for sequence prediction, 2017.

M. Ballesteros, Y. Goldberg, C. Dyer, and N. Smith, Training with exploration improves a greedy stack-LSTM parser, EMNLP, 2016.

S. Bengio, O. Vinyals, N. Jaitly, and N. Shazeer, Scheduled sampling for sequence prediction with recurrent neural networks, NIPS, 2015.

A. Beygelzimer, H. Daumé, I. , J. Langford, and P. Mineiro, Learning reductions that really work, Proceedings of the IEEE, 2016.
DOI : 10.1109/jproc.2015.2494118
URL : http://arxiv.org/pdf/1502.02704

M. Cettolo, J. Niehues, S. Stuker, L. Bentivogli, and M. Federico, Report on the 11th IWSLT evaluation campaign, Proceedings of IWSLT, 2014.

K. Chang, A. Krishnamurthy, A. Agarwal, H. Daumé, I. et al., Learning to search better than your teacher, ICML, 2015.

K. Cho, B. Van-merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares et al., Learning phrase representations using RNN encoder-decoder for statistical machine translation, EMNLP, 2014.
DOI : 10.3115/v1/d14-1179
URL : https://hal.archives-ouvertes.fr/hal-01433235

H. Daumé, ,. Iii, and D. Marcu, Learning as search optimization: approximate large margin methods for structured prediction, ICML, 2005.

H. Daumé, J. Iii, D. Langford, and . Marcu, Search-based structured prediction, Machine Learning, 2009.

K. Gimpel, . Noah, and . Smith, Softmax-margin CRFs: Training loglinear models with cost functions, NAACL, 2010.

Y. Golberg and J. Nivre, A dynamic oracle for arc-eager dependency parsing, 2012.

I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, 2016.

J. Goodman, A. Vlachos, and J. Naradowsky, Noise reduction and targeted exploration in imitation learning for abstract meaning representation parsing, ACL, 2016.

T. Hazan and R. Urtasun, A primal-dual message-passing algorithm for approximated large scale structured prediction, NIPS, 2010.

S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Computation, 1997.

S. Jean, K. Cho, R. Memisevic, and Y. Bengio, On using very large target vocabulary for neural machine translation, ACL, 2015.
DOI : 10.3115/v1/p15-1001
URL : https://doi.org/10.3115/v1/p15-1001

P. Diederik, J. Kingma, and . Ba, A method for stochastic optimization, ICLR, 2015.

M. Kääriäinen, Lower bounds for reductions, Talk at the Atomic Learning Workshop (TTI-C), 2006.

Y. Lee, Y. Lin, and G. Wahba, Multicategory support vector machines: Theory and application to the classification of microarray data and satellite radiance data, Journal of the American Statistical Association, 2004.

M. Norouzi, S. Bengio, Z. Chen, N. Jaitly, M. Schuster et al., Reward augmented maximum likelihood for neural structured prediction, NIPS, 2016.

K. Papineni, S. Roukos, T. Ward, and W. Zhu, Bleu: a method for automatic evaluation of machine translation, ACL, 2002.

P. Pletscher, . Cheng-soon, J. M. Ong, and . Buhmann, Entropy and margin maximization for structured output learning, ECML PKDD, 2010.
DOI : 10.1007/978-3-642-15939-8_6
URL : https://link.springer.com/content/pdf/10.1007%2F978-3-642-15939-8_6.pdf

S. Marc'aurelio-ranzato, M. Chopra, W. Auli, and . Zaremba, Sequence level training with recurrent neural networks, In ICLR, 2016.

S. Rennie, E. Marcheret, Y. Mroueh, J. Ross, and V. Goel, Self-critical sequence training for image captioning, 2016.
DOI : 10.1109/cvpr.2017.131
URL : http://arxiv.org/pdf/1612.00563

S. Ross and J. A. Bagnell, Reinforcement and imitation learning via interactive no-regret learning, 2014.

S. Shen, Y. Cheng, Z. He, W. He, H. Wu et al., Minimum risk training for neural machine translation, 2016.
DOI : 10.18653/v1/p16-1159
URL : https://doi.org/10.18653/v1/p16-1159

W. Sun, A. Venkatraman, G. J. Gordon, B. Boots, and J. A. Bagnell, Deeply AggreVaTeD: Differentiable imitation learning for sequential prediction, 2017.

I. Sutskever, O. Vinyals, and Q. Le, Sequence to sequence learning with neural networks, NIPS, 2014.

C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, Rethinking the inception architecture for computer vision, CVPR, 2016.

B. Taskar, C. Guestrin, and D. Koller, Max-margin Markov networks, NIPS, 2003. Ioannis Tsochantaridis, Thorsten Joachims, Thomas Hofmann, and Yasemin Altun. Large margin methods for structured and interdependent output variables. JMLR, 2005.

O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, Show and tell: A neural image caption generator, CVPR, 2015.

S. Wiseman and A. Rush, Sequence-to-sequence learning as beam-search optimization, EMNLP, 2016.