A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, Advances in neural information processing systems, pp.1097-1105, 2012.
DOI : 10.1162/neco.2009.10-08-881

K. He, X. Zhang, S. Ren, and J. Sun, Deep Residual Learning for Image Recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.770-778, 2016.
DOI : 10.1109/CVPR.2016.90
URL : http://arxiv.org/pdf/1512.03385

C. Szegedy, W. Liu, Y. Jia, and P. Sermanet, , 2015.

V. Mnih, K. Kavukcuoglu, D. Silver, and A. Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312, 2013.

V. Mnih, K. Kavukcuoglu, D. Silver, A. Andrei, J. Rusu et al., Human-level control through deep reinforcement learning, Nature, vol.101, issue.7540, p.518529, 2015.
DOI : 10.1016/S0004-3702(98)00023-X

J. Long, E. Shelhamer, and T. Darrell, Fully convolutional networks for semantic segmentation, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.3431-3440, 2015.
DOI : 10.1109/CVPR.2015.7298965

K. Shaoqing-ren, R. He, J. Girshick, and . Sun, Faster r-cnn: Towards real-time object detection with region proposal networks, Advances in neural information processing systems, pp.91-99, 2015.

W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed et al., SSD: Single Shot MultiBox Detector, European conference on computer vision, pp.21-37, 2016.
DOI : 10.1109/CVPR.2008.4587597

A. Graves, A. Mohamed, and G. Hinton, Speech recognition with deep recurrent neural networks, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.6645-6649, 2013.
DOI : 10.1109/ICASSP.2013.6638947

W. Yin, K. Kann, M. Yu, and H. Schütze, Comparative study of cnn and rnn for natural language processing, 2017.

Y. Lecun, Y. Bengio, and G. Hinton, Deep learning, Nature, vol.9, issue.7553, p.436, 2015.
DOI : 10.1007/s10994-013-5335-x

A. Herbert and . Simon, Cognitive science: The newest science of the artificial, Cognitive science, vol.4, issue.1, pp.33-46, 1980.

M. Wu, C. Michael, S. Hughes, M. Parbhoo, V. Zazzi et al., Beyond sparsity: Tree regularization of deep models for interpretability . arXiv preprint, 2017.

S. Ritter, D. G. Barrett, A. Santoro, M. Matt, and . Botvinick, Cognitive psychology for deep neural networks: A shape bias case study, 2017.

H. Sak, A. Senior, and F. Beaufays, Long short-term memory recurrent neural network architectures for large scale acoustic modeling, Fifteenth annual conference of the international speech communication association, 2014.

Y. Miao, J. Li, Y. Wang, S. Zhang, and Y. Gong, Simplifying long short-term memory acoustic models for fast training and decoding, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.2284-2288, 2016.
DOI : 10.1109/ICASSP.2016.7472084

M. Prakash, L. Nadkarni, W. W. Ohno-machado, and . Chapman, Natural language processing: an introduction, Journal of the American Medical Informatics Association, vol.18, issue.5, pp.544-551, 2011.

M. Liang and X. Hu, Recurrent convolutional neural network for object recognition, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.3367-3375, 2015.
DOI : 10.1109/CVPR.2015.7298958

S. Tripathi, C. Zachary, S. Lipton, T. Belongie, and . Nguyen, Context Matters: Refining Object Detection in Video with Recurrent Neural Networks, Procedings of the British Machine Vision Conference 2016, 2016.
DOI : 10.5244/C.30.44

Y. Bengio, P. Simard, and P. Frasconi, Learning long-term dependencies with gradient descent is difficult, IEEE Transactions on Neural Networks, vol.5, issue.2, pp.157-166, 1994.
DOI : 10.1109/72.279181

S. Hochreiter and J. Schmidhuber, Long Short-Term Memory, Neural Computation, vol.4, issue.8, pp.1735-1780, 1997.
DOI : 10.1016/0893-6080(88)90007-X

K. Cho, B. Van-merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares et al., Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint, 2014.
DOI : 10.3115/v1/d14-1179
URL : https://hal.archives-ouvertes.fr/hal-01433235

J. Jonides, L. Richard, D. E. Lewis, C. A. Nee, . Lustig et al., The Mind and Brain of Short-Term Memory, Annual Review of Psychology, vol.59, issue.1, pp.193-224, 2008.
DOI : 10.1146/annurev.psych.59.103006.093615

J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins et al., Overcoming catastrophic forgetting in neural networks, Proceedings of the National Academy of Sciences, pp.3521-3526, 2017.
DOI : 10.1016/0047-259X(82)90077-X
URL : http://www.pnas.org/content/114/13/3521.full.pdf

F. Zenke, B. Poole, and S. Ganguli, Improved multitask learning through synaptic intelligence. arXiv preprint, 2017.

K. Yi and S. Chen,

R. Aljundi, F. Babiloni, M. Elhoseiny, M. Rohrbach, and T. Tuytelaars, Memory aware synapses: Learning what (not) to forget. arXiv preprint, 2017.

T. Liu, Z. Yuan, J. Sun, J. Wang, N. Zheng et al., Learning to Detect A Salient Object, 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp.353-367, 2011.
DOI : 10.1109/CVPR.2007.383047

. Helgi-pll-helgason, General attention mechanism for artificial intelligence systems, 2013.

D. Bahdanau, K. Cho, and Y. Bengio, Neural machine translation by jointly learning to align and translate. arXiv preprint, 2014.

J. Gehring, M. Auli, D. Grangier, D. Yarats, N. Yann et al., Convolutional sequence to sequence learning, 2017.

Z. Yang, D. Yang, C. Dyer, X. He, A. J. Smola et al., Hierarchical Attention Networks for Document Classification, Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016.
DOI : 10.18653/v1/N16-1174

D. Singh-chaplot, K. M. Sathyendra, R. Kumar-pasumarthi, D. Rajagopal, and R. Salakhutdinov, Gated-attention architectures for task-oriented language grounding, 2017.

B. Dhingra, H. Liu, Z. Yang, W. William, R. Cohen et al., Gated-Attention Readers for Text Comprehension, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2016.
DOI : 10.18653/v1/P17-1168

V. Mnih, N. Heess, and A. Graves, Recurrent models of visual attention, Advances in neural information processing systems, pp.2204-2212, 2014.

J. Ba, V. Mnih, and K. Kavukcuoglu, Multiple object recognition with visual attention, Computer Science, 2014.

K. Xu, J. Ba, R. Kiros, K. Cho, and A. Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. Show, attend and tell: Neural image caption generation with visual attention, International Conference on Machine Learning, pp.2048-2057, 2015.

J. Kuen, Z. Wang, and G. Wang, Recurrent attentional networks for saliency detection. arXiv preprint, 2016.

G. Li and Y. Yu, Visual saliency based on multiscale deep features. arXiv preprint, 2015.

N. Liu, J. Han, D. Zhang, S. Wen, and T. Liu, Predicting eye fixations using convolutional neural networks, Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on, pp.362-370, 2015.

S. Srinivas, K. Kruthiventi, . Ayush, and . Babu, Deepfix: A fully convolutional neural network for predicting human eye fixations, IEEE Transactions on Image Processing, vol.26, issue.9, pp.4446-4456, 2017.

M. Cornia, L. Baraldi, G. Serra, and R. Cucchiara, Predicting human eye fixations via an lstm-based saliency attentive model, 2016.

I. Sorokin, A. Seleznev, and M. Pavlov, Aleksandr Fedorov, and Anastasiia Ignateva. Deep attention recurrent q-network, 2015.

J. Choi, B. Lee, and B. Zhang, Multi-focus attention network for efficient deep reinforcement learning, 2017.

M. Brenden, T. D. Lake, J. B. Ullman, S. J. Tenenbaum, and . Gershman, Building machines that learn and think like people, 1604.

B. Landau, B. Linda, . Smith, S. Susan, and . Jones, The importance of shape in early lexical learning, Cognitive Development, vol.3, issue.3, pp.299-321, 1988.
DOI : 10.1016/0885-2014(88)90014-7
URL : http://mind.cog.jhu.edu/faculty/landau/barbaraspapers/landau, smith, jones (1988).pdf

O. Vinyals, C. Blundell, T. Lillicrap, and D. Wierstra, Matching networks for one shot learning, Advances in Neural Information Processing Systems, pp.3630-3638, 2016.

R. Stewart and S. Ermon, Label-free supervision of neural networks with physics and domain knowledge, AAAI, pp.2576-2582, 2017.

Z. Hu, X. Ma, Z. Liu, E. Hovy, and E. Xing, Harnessing Deep Neural Networks with Logic Rules, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2016.
DOI : 10.18653/v1/P16-1228
URL : https://doi.org/10.18653/v1/p16-1228

K. Ganchev, J. Gillenwater, and B. Taskar, Posterior regularization for structured latent variable models, Journal of Machine Learning Research, vol.11, 2001.

Z. Hu, Z. Yang, R. Salakhutdinov, and E. Xing, Deep Neural Networks with Massive Learned Knowledge, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp.1670-1679, 2016.
DOI : 10.18653/v1/D16-1173

J. John and . Hopfield, Neural networks and physical systems with emergent collective computational abilities, Proceedings of the national academy of sciences, pp.2554-2558, 1982.

D. Krotov and J. J. Hopfield, Dense associative memory for pattern recognition . CoRR, abs, 1164.

J. Oh, V. Chockalingam, S. Singh, and H. Lee, Control of memory, active perception, and action in minecraft, 2016.

E. Parisotto and R. Salakhutdinov, Neural map: Structured memory for deep reinforcement learning. arXiv preprint, 2017.

H. Yin and . Sinno-jialin-pan, Knowledge transfer for deep reinforcement learning with hierarchical experience replay, AAAI, pp.1640-1646, 2017.