R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu et al., Natural Language Processing, vol.12, 2011.

R. De-mori, F. Bechet, D. Hakkani-tur, M. Mctear, G. Riccardi et al., Spoken Language Understanding: A Survey, IEEE Signal Processing Magazine, vol.25, pp.50-58, 2008.
URL : https://hal.archives-ouvertes.fr/hal-01314884

I. Sutskever, O. Vinyals, and Q. V. Le, Sequence to Sequence Learning with Neural Networks, 2014.

D. Bahdanau, K. Cho, and Y. Bengio, Neural Machine Translation by Jointly Learning to Align and Translate, 2014.

M. Collins, Three generative, lexicalised models for statistical parsing, Proceedings of ACL, pp.16-23, 1997.

W. M. Soon, H. T. Ng, and D. C. Lim, A Machine Learning Approach to Coreference Resolution of Noun Phrases, Computational Linguistics, vol.27, issue.4, pp.521-544, 2001.

V. Ng and C. Cardie, Improving Machine Learning Approcahes to Corefrence Resolution, Proceedings of ACL'02, pp.104-111, 2002.

C. Grouin, M. Dinarelli, S. Rosset, G. Wisniewski, and P. Zweigenbaum, Coreference Resolution in Clinical Reports. The LIMSI Participation in the i2b2/VA 2011 Challenge, Proceedings of i2b2/VA 2011 Coreference Resolution Workshop, 2011.

M. Dinarelli and S. Rosset, Tree Representations in Probabilistic Models for Extended Named Entity Detection, European Chapter of the Association for Computational Linguistics (EACL), pp.174-184, 2012.

M. Dinarelli, S. ;. Rosset, N. C. Choukri, K. Declerck, T. Dogan et al., Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12), European Language Resources Association (ELRA), 2012.

A. M. Rush, R. Reichart, M. Collins, and A. Globerson, Improved Parsing and POS Tagging Using Inter-sentence Consistency Constraints, Proceedings of EMNLP-CoNLL, 2012.

K. Lee, L. He, M. Lewis, and L. Zettlemoyer, End-to-end Neural Coreference Resolution, Proceedings of EMNLP, 2017.

G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, and C. Dyer, Neural architectures for named entity recognition, 2016.

X. Ma and E. Hovy, End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF, Proceedings of ACL, 2016.

R. Kemker, M. Mcclure, A. Abitino, T. L. Hayes, and C. Kanan, Measuring catastrophic forgetting in neural networks. In: Thirty-Second AAAI Conference on Artificial Intelligence, 2018.

I. Augenstein, S. Ruder, and A. Søgaard, Multi-Task Learning of Pairwise Sequence Classification Tasks over Disparate Label Spaces, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol.1, pp.1896-1906, 2018.

O. Vinyals, L. Kaiser, T. Koo, S. Petrov, I. Sutskever et al., Grammar as a Foreign Language, 2014.

M. Dinarelli, V. Vukotic, and C. Raymond, Label-dependency coding in Simple Recurrent Networks for Spoken Language Understanding, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01553830

P. Werbos, Backpropagation through time: what does it do and how to do it, Proceedings of IEEE, pp.1550-1560, 1990.

S. Hochreiter and J. Schmidhuber, Long Short-Term Memory, Neural Comput, vol.9, issue.8, pp.1735-1780, 1997.

K. Cho, B. Van-merrienboer, Ç. Gülçehre, F. Bougares, H. Schwenk et al., Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01433235

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones et al., Attention is All You Need, 2017.

V. Vukotic, C. Raymond, and G. Gravier, A step beyond local observations with a dialog aware bidirectional GRU network for Spoken Language Understanding, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01351733

J. P. Chiu and E. Nichols, Named Entity Recognition with Bidirectional LSTM-CNNs, 2015.

Z. Huang, W. Xu, and K. Yu, Bidirectional LSTM-CRF models for sequence tagging, 2015.

Y. Dupont, M. Dinarelli, and I. Tellier, Label-Dependencies Aware Recurrent Neural Networks, Proceedings of CICling, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01579071

M. Dinarelli and I. Tellier, Improving Recurrent Neural Networks For Sequence Labelling, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01489976

M. Dinarelli and I. Tellier, New Recurrent Neural Network Variants for Sequence Labeling, Proceedings of the 17th International Conference on Intelligent Text Processing and Computational Linguistics, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01489955

H. Bonneau-maynard, C. Ayache, F. Bechet, A. Denis, A. Kuhn et al., Results of the French Evalda-Media evaluation campaign for literal understanding, LREC, pp.2054-2059, 2006.
URL : https://hal.archives-ouvertes.fr/hal-01160167

M. P. Marcus, B. Santorini, and M. A. Marcinkiewicz, Building a Large Annotated Corpus of English: The Penn Treebank, COMPUTATIONAL LINGUISTICS, vol.19, issue.2, 1993.

Y. Kim, C. Denton, L. Hoang, and A. M. Rush, Structured Attention Networks, 2017.

E. Simonnet, N. Camelin, P. Deléglise, and Y. Estève, Exploring the use of Attention-Based Recurrent Neural Networks For Spoken Language Understanding, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01433202

J. L. Elman, Finding structure in time, COGNITIVE SCIENCE, vol.14, issue.2, 1990.

M. I. Jordan, Serial Order: A Parallel, Distributed Processing Approach, 1989.

Y. Bengio, P. Simard, and P. Frasconi, Learning Long-term Dependencies with Gradient Descent is Difficult, Trans. Neur. Netw, vol.5, issue.2, pp.157-166, 1994.

G. Mesnil, Y. Dauphin, K. Yao, Y. Bengio, L. Deng et al., Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding, IEEE/ACM TASLP, 2015.

Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin, A Neural Probabilistic Language Model, JOURNAL OF MACHINE LEARNING RESEARCH, vol.3, pp.1137-1155, 2003.

C. Raymond and G. Riccardi, Generative and Discriminative Algorithms for Spoken Language Understanding, 2007.

M. Dinarelli, A. Moschitti, and G. Riccardi, Concept Segmentation And Labeling For Conversational Speech, Proceedings of the International Conference of the Speech Communication Assosiation (Interspeech), 2009.

S. Hahn, M. Dinarelli, C. Raymond, F. Lefèvre, P. Lehen et al., Comparing Stochastic Approaches to Spoken Language Understanding in Multiple Languages, IEEE TASLP, p.99, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00746965

M. Dinarelli and S. Rosset, Hypotheses Selection Criteria in a Reranking Framework for Spoken Language Understanding, Conference of Empirical Methods for Natural Language Processing, pp.1104-1115, 2011.

M. Dinarelli, A. Moschitti, and G. Riccardi, Discriminative Reranking for Spoken Language Understanding, IEEE TASLP, vol.20, pp.526-539, 2011.
URL : https://hal.archives-ouvertes.fr/hal-01478984

V. N. Vapnik, Statistical Learning Theory, 1998.

J. Lafferty, A. Mccallum, and F. Pereira, Conditional random fields: Probabilistic models for segmenting and labeling sequence data, Proceedings of ICML, 2001.

G. Mesnil, X. He, L. Deng, and Y. Bengio, Investigation of Recurrent-Neural-Network Architectures and Learning Methods for Spoken Language Understanding, In: Interspeech, 2013.

K. Yao, G. Zweig, M. Hwang, Y. Shi, and D. Yu, Recurrent Neural Networks for Language Understanding, 2013.

V. Vukotic, C. Raymond, and G. Gravier, Is it time to switch to Word Embedding and Recurrent Neural Networks for Spoken Language Understanding? In: InterSpeech, 2015.

K. Yao, B. Peng, Y. Zhang, D. Yu, G. Zweig et al., Spoken Language Understanding Using Long Short-Term Memory Neural Networks, 2014.

C. Wang, Network of Recurrent Neural Networks, 2017.

J. H. Holland, Emergence: From Chaos to Order, 1999.

W. B. Arthur, On the Evolution of Complexity. Working Papers, pp.93-104, 1993.

O. Levy, K. Lee, N. Fitzgerald, and L. Zettlemoyer, Long Short-Term Memory as a Dynamically Computed Element-wise Weighted Sum, Proceedings of ACL, pp.732-739, 2018.

M. Dehghani, S. Gouws, O. Vinyals, J. Uszkoreit, and L. Kaiser, Universal Transformers, 2018.

Y. Zhang, H. Chen, Y. Zhao, Q. Liu, and D. Yin, Learning Tag Dependencies for Sequence Tagging, International Joint Conference on Artificial Intelligence (IJCAI), 2018.

L. Ramshaw and M. Marcus, Text chunking using transformation-based learning, Proceedings of the 3rd Workshop on Very Large Corpora, pp.84-94, 1995.

K. He, X. Zhang, S. Ren, and J. Sun, Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification, 2015 IEEE International Conference on Computer Vision, ICCV 2015, pp.1026-1034, 2015.

Y. Bengio, Practical recommendations for gradient-based training of deep architectures. CoRR abs/1206, p.5533, 2012.

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang et al., Automatic differentiation in PyTorch, 2017.

A. Gruslys, R. Munos, I. Danihelka, M. Lanctot, and A. Graves, Memory-Efficient Backpropagation Through Time, 2016.

A. Yeh, More Accurate Tests for the Statistical Significance of Result Differences, Proceedings of Coling, pp.947-953, 2000.

S. Padó, J. Pennington, R. Socher, and C. D. Manning, User's guide to sigf: Significance testing by approximate randomisation, Empirical Methods in Natural Language Processing (EMNLP), vol.63, pp.1532-1543, 2006.