, -decoder for statistical machine translation, EMNLP

R. Collobert and J. Weston, A unified architecture for natural language processing, Proceedings of the 25th international conference on Machine learning, ICML '08, 2008.
DOI : 10.1145/1390156.1390177

Y. Dauphin, A. Fan, M. Auli, and D. Grangier, Language modeling with gated convolutional networks, ICML, 2017.

Y. Deng, Y. Kim, J. Chiu, D. Guo, and A. Rush, Latent alignment and variational attention. arXiv preprint, 2018.

S. Edunov, M. Ott, M. Auli, D. Grangier, and M. Ranzato, Classical Structured Prediction Losses for Sequence to Sequence Learning, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 2018.
DOI : 10.18653/v1/N18-1033

URL : https://doi.org/10.18653/v1/n18-1033

J. Gehring, M. Auli, D. Grangier, and Y. Dauphin, A Convolutional Encoder Model for Neural Machine Translation, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2017.
DOI : 10.18653/v1/P17-1012

J. Gehring, M. Auli, D. Grangier, D. Yarats, and Y. Dauphin, Convolutional sequence to sequence learning, ICML, 2017.

A. Graves, Sequence transduction with recurrent neural networks, 2012.

S. Hochreiter and J. Schmidhuber, Long Short-Term Memory, Neural Computation, vol.4, issue.8, pp.1735-1780, 1997.
DOI : 10.1016/0893-6080(88)90007-X

G. Huang, Z. Liu, L. Van-der-maaten, and K. Weinberger, Densely Connected Convolutional Networks, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
DOI : 10.1109/CVPR.2017.243

URL : http://arxiv.org/pdf/1608.06993

P. Huang, C. Wang, S. Huang, D. Zhou, and L. Deng, Towards neural phrase-based machine translation, ICLR, 2018.

S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, ICML, 2015.

S. Jean, K. Cho, R. Memisevic, and Y. Bengio, On Using Very Large Target Vocabulary for Neural Machine Translation, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2015.
DOI : 10.3115/v1/P15-1001

N. Kalchbrenner, I. Danihelka, and A. Graves, Grid long short-term memory, ICLR, 2016.

N. Kalchbrenner, L. Espeholt, K. Simonyan, A. Van-den-oord, A. Graves et al., 2016b. Neural machine translation in linear time. arXiv

N. Kalchbrenner, E. Grefenstette, and P. Blunsom, A Convolutional Neural Network for Modelling Sentences, Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2014.
DOI : 10.3115/v1/P14-1062

Y. Kim, Convolutional Neural Networks for Sentence Classification, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014.
DOI : 10.3115/v1/D14-1181

D. Kingma and J. Ba, Adam: A method for stochastic optimization, ICLR, 2015.

P. Koehn, H. Hoang, A. Birch, C. Callison-burch, M. Federico et al., Moses, Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, ACL '07, 2007.
DOI : 10.3115/1557769.1557821

Y. Lecun, Y. Bengio, and G. Hinton, Deep learning, Nature, vol.9, issue.7553, pp.436-444, 2015.
DOI : 10.1007/s10994-013-5335-x

Z. Lin, M. Feng, C. Santos, M. Yu, B. Xiang et al., A structured selfattentive sentence embedding, 2017.

T. Luong, H. Pham, and C. Manning, Effective Approaches to Attention-based Neural Machine Translation, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015.
DOI : 10.18653/v1/D15-1166

URL : https://doi.org/10.18653/v1/d15-1166

F. Meng, Z. Lu, M. Wang, H. Li, W. Jiang et al., Encoding Source Language with Convolutional Neural Network for Machine Translation, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2015.
DOI : 10.3115/v1/P15-1003

URL : https://doi.org/10.3115/v1/p15-1003

V. Nair and G. Hinton, Rectified linear units improve restricted Boltzmann machines, ICML, 2010.

A. Van-den-oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals et al., Wavenet: a generative model for raw audio, ISCA Speech Syntesis Workshop, 2016.

A. Van-den-oord, N. Kalchbrenner, and K. Kavukcuoglu, 2016b. Pixel recurrent neural networks. In ICML

A. Van-den-oord, N. Kalchbrenner, O. Vinyals, L. Espeholt, A. Graves et al., Conditional image generation with PixelCNN decoders, NIPS, 2016.

K. Papineni, S. Roukos, T. Ward, and W. Zhu, BLEU, Proceedings of the 40th Annual Meeting on Association for Computational Linguistics , ACL '02, 2002.
DOI : 10.3115/1073083.1073135

A. Parikh, O. Täckström, D. Das, and J. Uszkoreit, A Decomposable Attention Model for Natural Language Inference, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016.
DOI : 10.18653/v1/D16-1244

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang et al., Automatic differentiation in pytorch, NIPS-W, 2017.

M. Ranzato, S. Chopra, M. Auli, and W. Zaremba, Sequence level training with recurrent neural networks, 2016.

S. Reed, A. Van-den-oord, N. Kalchbrenner, S. Gómez-colmenarejo, Z. Wang et al., , 2017.

T. Salimans, A. Karpathy, X. Chen, and D. Kingma, PixelCNN++: Improving the PixelCNN with discretized logistic mixture likelihood and other modifications, 2017.

M. Schuster and K. Paliwal, Bidirectional recurrent neural networks, IEEE Transactions on Signal Processing, vol.45, issue.11, pp.2673-2681, 1997.
DOI : 10.1109/78.650093

R. Sennrich, B. Haddow, and A. Birch, Neural Machine Translation of Rare Words with Subword Units, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2016.
DOI : 10.18653/v1/P16-1162

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout: A simple way to prevent neural networks from overfitting, 2014.

I. Sutskever, O. Vinyals, and Q. Le, Sequence to sequence learning with neural networks, NIPS, 2014.

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones et al., Attention is all you need, NIPS, 2017.

C. Wang, Y. Wang, P. Huang, A. Mohamed, D. Zhou et al., Sequence modeling via segmentations, ICML, 2017.

L. Wu, Y. Xia, L. Zhao, F. Tian, T. Qin et al.,

. Liu, Adversarial neural machine translation. arXiv, 2017.

K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville et al., 2015. Show, attend and tell: Neural image caption generation with visual attention, ICML
URL : https://hal.archives-ouvertes.fr/hal-01466414