L. Avdiyenko, N. Bertschinger, and J. Jost, Adaptive Sequential Feature Selection for Pattern Classification, IJCCI International Joint Conference of Computational Intelligence, pp.474-482, 2012.

D. Bahdanau, K. Cho, and Y. Bengio, Neural Machine Translation By Jointly Learning To Align and Translate. International Conference on Learning Representations, 2015.

E. Bengio, P. Bacon, J. Pineau, D. Precup, K. N. Networks et al., Conditional computation in neural networks for faster models, International Conference on Learning Representations, Workshop Track, 2016.

Y. Bengio, N. Léonard, and A. Courville, Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation, 2013.

A. Cano, A. R. Masegosa, and S. Moral, A method for integrating expert knowledge when learning bayesian networks from data, IEEE Transactions on Systems, Man, and Cybernetics, vol.41, issue.5, pp.1382-1394, 2011.
DOI : 10.1109/tsmcb.2011.2148197

R. Caruana, Multitask learning, Machine Learning, vol.28, issue.1, pp.41-75, 1997.

G. Chandrashekar and F. Sahin, A survey on feature selection methods, Computers and Electrical Engineering, vol.40, issue.1, pp.16-28, 2014.
DOI : 10.1016/j.compeleceng.2013.11.024

J. Chen, F. Lécué, J. Z. Pan, I. Horrocks, C. et al., Knowledge-based transfer learning explanation, Principles of Knowledge Representation and Reasoning: Proceedings of the Sixteenth International Conference, pp.349-358, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01934907

P. Daee, T. Peltola, M. Soare, and S. Kaski, Knowledge elicitation via sequential probabilistic inference for highdimensional prediction, Machine Learning, vol.106, issue.9, pp.1599-1620, 2017.
DOI : 10.1007/s10994-017-5651-7

URL : https://link.springer.com/content/pdf/10.1007%2Fs10994-017-5651-7.pdf

J. Duchi, E. Hazan, and Y. Singer, Adaptive Subgradient Methods for Online Learning and Stochastic Optimization, GoogleResearch. 2015. TensorFlow: Large-scale machine learning on heterogeneous systems, vol.12, pp.2121-2159, 2011.

E. J. Gumbel, Statistical theory of extreme values and some practical applications: a series of lectures, US Govt. Print. Office, 1954.

I. Guyon and A. Elisseeff, An Introduction to Variable and Feature Selection, Journal of Machine Learning Research, vol.3, pp.1157-1182, 2003.

L. House, S. Leman, and C. Han, Bayesian visual analytics: Bava. Statistical Analysis and Data Mining, vol.8, pp.1-13, 2015.
DOI : 10.1002/sam.11253

E. Jang, S. Gu, and B. Poole, Categorical Reparameterization with Gumbel-Softmax, International Conference on Learning Representations, pp.1-13, 2017.

D. P. Kingma and M. Welling, Auto-Encoding Variational Bayes, Proceedings of the 2nd International Conference on Learning Representations (ICLR), 2014.

D. P. Kingma, T. Salimans, R. Jozefowicz, X. Chen, I. Sutskever et al., Improving Variational Inference with Inverse Autoregressive Flow, Conference on Neural Information Processing Systems, number Nips, 2016.

R. Kohavi, J. , and G. H. , Wrappers for feature subset selection, Artificial Intelligence, vol.1, issue.2, pp.273-324, 1997.
DOI : 10.1016/s0004-3702(97)00043-x

URL : https://doi.org/10.1016/s0004-3702(97)00043-x

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, GradientBased Learning Applied to Document Recognition, Proceedings of the IEEE, vol.11, issue.86, pp.2278-2324, 1998.
DOI : 10.1109/5.726791

URL : http://www.cs.berkeley.edu/~daf/appsem/Handwriting/papers/00726791.pdf

C. J. Maddison, A. Mnih, and Y. W. Teh, The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables, Advances in Neural Information Processing Systems, pp.3528-3536, 2015.

V. Mnih, N. Heess, A. Graves, and K. Kavukcuoglu, Recurrent Models of Visual Attention, Advances in neural information processing systems, pp.2204-2212, 2014.

H. Raghavan, O. Madani, and R. Jones, Active learning with feedback on features and instances, The Journal of Machine Learning, vol.7, pp.1655-1686, 2006.

J. Schulman, N. Heess, T. Weber, and P. Abbeel, Gradient Estimation Using Stochastic Computation Graphs, Advances in Neural Information Processing Systems, pp.3528-3536, 2015.

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout: A Simple Way to Prevent Neural Networks from Overfitting, Journal of Machine Learning Research, vol.15, pp.1929-1958, 2014.

R. S. Sutton and A. G. Barto, Reinforcement Learning : An Introduction, 2011.

V. Tyagi and A. Mishra, A Survey on Different Feature Selection Methods for Microarray Data Analysis, International Journal of Computer Applications, vol.67, issue.16, pp.975-8887, 2013.

R. J. Williams, Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning, Machine learning, pp.229-256, 1992.

K. Xu, A. Courville, R. S. Zemel, and Y. Bengio, Show , Attend and Tell : Neural Image Caption Generation with Visual Attention, International Conference on Machine Learning, pp.2048-2057, 2015.