K. Cho, B. Van-merrienboer, Ç. Gülçehre, F. Bougares, H. Schwenk et al., Learning phrase representations using RNN encoder-decoder for statistical machine translation, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01433235

J. Devlin, M. Chang, K. Lee, and K. Toutanova, BERT: pre-training of deep bidirectional transformers for language understanding, 2018.

D. Erhan, Y. Bengio, A. Courville, P. Manzagol, P. Vincent et al., Why does unsupervised pre-training help deep learning, Journal of Machine Learning Research, vol.11, pp.625-660, 2010.

A. Fouillet, V. Bousquet, I. Pontais, A. Gallay, and . Schonemann,

, The french emergency department oscour network: Evaluation after a 10-year existence, Online Journal of Public Health Informatics, vol.7, issue.1, p.74

J. Howard and S. Ruder, Universal language model finetuning for text classification, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, vol.1, pp.328-339, 2018.

J. Huang, C. Osorio, and L. W. Sy, An empirical evaluation of deep learning for icd-9 code assignment using mimic-iii clinical notes, Computer Methods and Programs in Biomedicine, vol.177, pp.141-153, 2019.

M. Li, Z. Fei, M. Zeng, F. Wu, Y. Li et al., , 2019.

, IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.16, issue.4, pp.1193-1202

W. H. Organization, International statistical classification of diseases and related health problems : 10th revision (ICD-10), 2015.

M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark et al., , 2018.

D. M. Powers, Evaluation: from precision, recall and fmeasure to roc, informedness, markedness and correlation, 2011.

A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, Improving language understanding by generative pre-training, 2018.

A. Radford, J. Wu, R. Child, D. Luan, D. Amodei et al., Language models are unsupervised multitask learners, OpenAI Blog, vol.1, issue.8, 2019.

S. Rothe, S. Narayan, and A. Severyn, Leveraging pre-trained checkpoints for sequence generation tasks, 2019.

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones et al., Attention is all you need, Advances in Neural Information Processing Systems, vol.30, pp.5998-6008, 2017.

Z. Yang, Z. Dai, Y. Yang, J. G. Carbonell, R. Salakhutdinov et al., Xlnet: Generalized autoregressive pretraining for language understanding, 2019.