TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Software available from tensorflow ,
Parsing by chunks " . In: Principle-based parsing, pp.257-278, 1991. ,
Trecvid 2016: Evaluating video search, video event detection, localization, and hyperlinking, Proceedings of TRECVID, 2016. ,
Neural machine translation by jointly learning to align and translate, 2014. ,
Theano: new features and speed improvements, Deep Learning and Unsupervised Feature Learning NIPS 2012 Workshop, 2012. ,
Surf: Speeded up robust features, pp.2006-404, 2006. ,
DOI : 10.1007/11744023_32
Learning long-term dependencies with gradient descent is difficult, Neural Networks, IEEE Transactions on 5, pp.157-166, 1994. ,
DOI : 10.1109/72.279181
Theano: a CPU and GPU math expression compiler, Proceedings of the Python for scientific computing conference (SciPy, p.3, 2010. ,
Latent Dirichlet Allocation, J. Mach. Learn. Res, vol.3, pp.993-1022, 2003. ,
Exploiting Multimodality in Video Hyperlinking to Improve Target Diversity, International Conference on Multimedia Modeling, pp.185-197, 2017. ,
DOI : 10.1007/s10994-010-5198-3
URL : https://hal.archives-ouvertes.fr/hal-01498130
IRISA at TRECVid2015: Leveraging Multimodal LDA for Video Hyperlinking, Proc. of TRECVID, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01403726
Semantic Annotation of the French Media Dialog Corpus, 2005. ,
Comparing Semantic Models for Evaluating Automatic Document Summarization, Text, Speech, and Dialogue, 2015. ,
DOI : 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
Unsupervised Learning Algorithms, 2016. ,
Multimodal sparse representation learning and applications, CoRR abs, p.6238, 1511. ,
Infogan: Interpretable representation learning by information maximizing generative adversarial nets, Advances in Neural Information Processing Systems. 2016, pp.2172-2180 ,
Learning Phrase Representations using RNN Encoder???Decoder for Statistical Machine Translation, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014. ,
DOI : 10.3115/v1/D14-1179
URL : https://hal.archives-ouvertes.fr/hal-01433235
Learning Phrase Representations using RNN Encoder???Decoder for Statistical Machine Translation, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014. ,
DOI : 10.3115/v1/D14-1179
URL : https://hal.archives-ouvertes.fr/hal-01433235
Xception: Deep Learning with Depthwise Separable Convolutions, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. ,
DOI : 10.1109/CVPR.2017.195
Empirical evaluation of gated recurrent neural networks on sequence modeling, 2014. ,
IRISA at TRECVid2015: Leveraging Multimodal LDA for Video Hyperlinking, Proc. of TRECVID, 2015. ,
Expanding the scope of the ATIS task, Proceedings of the workshop on Human Language Technology , HLT '94, pp.43-48, 1994. ,
DOI : 10.3115/1075812.1075823
Histograms of Oriented Gradients for Human Detection, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), pp.886-893, 2005. ,
DOI : 10.1109/CVPR.2005.177
URL : https://hal.archives-ouvertes.fr/inria-00548512
Front-End Factor Analysis for Speaker Verification, IEEE Transactions on Audio, Speech, and Language Processing, vol.19, issue.4, pp.788-798, 2011. ,
DOI : 10.1109/TASL.2010.2064307
Label-Dependency Coding in Simple Recurrent Networks for Spoken Language Understanding, Interspeech 2017, 2017. ,
DOI : 10.21437/Interspeech.2017-1480
URL : https://hal.archives-ouvertes.fr/hal-01553830
Learning a Deep Convolutional Network for Image Super-Resolution, European Conference on Computer Vision, pp.184-199, 2014. ,
DOI : 10.1007/978-3-319-10593-2_13
Finding structure in time, Cognitive science 14, pp.179-211, 1990. ,
Multimodal Video-to-Video Linking: Turning to the Crowd for Insight and Evaluation, Proc. of the 23rd International Conference on Multimedia Modeling, 2017. ,
DOI : 10.1145/2483977.2483988
The Search and Hyperlinking Task at MediaEval, 2014. ,
Cross-modal retrieval with correspondence autoencoder, ACM Intl. Conf. on Multimedia. 2014, pp.7-16 ,
DOI : 10.1145/2647868.2654902
Predicting Object Dynamics in Scenes, 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp.2019-2026 ,
DOI : 10.1109/CVPR.2014.260
A Neural Algorithm of Artistic Style, Journal of Vision, vol.16, issue.12, p.CoRR, 2015. ,
DOI : 10.1167/16.12.326
The LIMSI Broadcast News transcription system, Speech Communication, vol.37, issue.1-2, pp.89-108, 2002. ,
DOI : 10.1016/S0167-6393(01)00061-9
URL : https://hal.archives-ouvertes.fr/hal-01434493
Learning to forget: Continual prediction with LSTM, Neural computation, vol.1210, pp.2451-2471, 2000. ,
Generative adversarial nets Advances in neural information processing systems, pp.2672-2680, 2014. ,
HITS and IRISA at MediaEval 2013: Search and hyperlinking task, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00906249
A Comparison of Various Methods for Concept Tagging for Spoken Language Understanding ,
URL : https://hal.archives-ouvertes.fr/hal-01321122
Comparing Stochastic Approaches to Spoken Language Understanding in Multiple Languages, IEEE Transactions on Audio, Speech, and Language Processing, vol.19, issue.6, pp.1569-1583, 2010. ,
DOI : 10.1109/TASL.2010.2093520
URL : https://hal.archives-ouvertes.fr/hal-00746965
Semantic processing using the Hidden Vector State model, Computer Speech & Language, vol.19, issue.1, pp.85-106, 2005. ,
DOI : 10.1016/j.csl.2004.03.001
Long Short-Term Memory, Neural computation 9, pp.1735-1780, 1997. ,
DOI : 10.1016/0893-6080(88)90007-X
Online Tracking by Learning Discriminative Saliency Map with Convolutional Neural Network, pp.597-606, 2015. ,
Action-Reaction: Forecasting the Dynamics of Human Interaction, pp.489-504, 2014. ,
DOI : 10.1007/978-3-319-10584-0_32
Generating images with recurrent adversarial networks, 2016. ,
Aggregating local descriptors into a compact image representation, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.3304-3311, 2010. ,
DOI : 10.1109/CVPR.2010.5540039
Fast and Accurate Content-based Semantic Search in 100M Internet Videos, Proceedings of the 23rd ACM international conference on Multimedia, MM '15, pp.49-58 ,
DOI : 10.1111/j.1467-9868.2005.00532.x
Perceptual Losses for Real-Time Style Transfer and Super-Resolution, p.CoRR, 2016. ,
DOI : 10.1007/978-3-642-27413-8_47
Serial order: A parallel distributed processing approach, In: Advances in psychology, vol.121, pp.471-495, 1997. ,
Adam: A method for stochastic optimization, 2014. ,
Activity forecasting, pp.201-214, 2012. ,
Anticipating human activities using object affordances for reactive robotic response, pp.14-29, 2016. ,
ImageNet classification with deep convolutional neural networks, Communications of the ACM, vol.60, issue.6, pp.1097-1105, 2012. ,
DOI : 10.1162/neco.2009.10-08-881
URL : http://dl.acm.org/ft_gateway.cfm?id=3065386&type=pdf
Chunking with Support Vector Machines, Proceedings of the Second Meeting of the North American Chapter of the Association for Computational Linguistics on Language Technologies. NAACL '01, pp.1-8, 2001. ,
Leveraging Sentence-level Information with Encoder LSTM for Natural Language Understanding, 2016. ,
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data, International Conference on Machine Learning, pp.282-289, 2001. ,
Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects, Proceedings of the IEEE, pp.1449-1477, 2015. ,
DOI : 10.1109/JPROC.2015.2460697
URL : https://hal.archives-ouvertes.fr/hal-01179853
A Hierarchical Representation for Future Action Prediction, pp.689-704, 2014. ,
DOI : 10.1007/978-3-319-10578-9_45
Boosting bonsai trees for efficient features combination : application to speaker role identification, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01025171
Practical Very Large Scale CRFs, Proceedings the 48th Annual Meeting of the Association for Computational Linguistics (ACL), pp.504-513, 2010. ,
Distributed Representations of Sentences and Documents, In: ICML, vol.14, pp.1188-1196, 2014. ,
Is Deep Learning Really Necessary for Word Embeddings? Tech. rep, 2013. ,
Comparison of learning algorithms for handwritten digit recognition, In: International conference on artificial neural networks, vol.60, pp.53-60, 1995. ,
Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. ,
DOI : 10.1109/CVPR.2017.19
SIFT Flow: Dense Correspondence Across Scenes and Its Applications, PAMI 33, pp.978-994, 2011. ,
DOI : 10.1007/978-3-319-23048-1_2
Fully convolutional networks for semantic segmentation, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.3431-3440, 2015. ,
DOI : 10.1109/CVPR.2015.7298965
Object recognition from local scale-invariant features " . In: Computer vision, 1999. The proceedings of the seventh, IEEE international conference on, vol.2, pp.1150-1157, 1999. ,
Semantic Retrieval of Personal Photos Using a Deep Autoencoder Fusing Visual Features with Speech Annotations Represented as Word/Paragraph Vectors, Annual Conf. of the Intl. Speech Communication Association, 2015. ,
Adversarial autoencoders, 2015. ,
Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding, 14th Annual Conference of the International Speech Communication Association, pp.3771-3775, 2013. ,
A performance evaluation of local descriptors, IEEE transactions, pp.1615-1630, 2005. ,
URL : https://hal.archives-ouvertes.fr/inria-00548227
Distributed representations of words and phrases and their compositionality, Advances in Neural Information Processing Systems, 2013. ,
Conditional generative adversarial nets, 2014. ,
Newtonian Image Understanding: Unfolding the Dynamics of Objects in Static Images, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p.CoRR, 2015. ,
DOI : 10.1109/CVPR.2016.383
Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. ,
DOI : 10.1109/CVPR.2017.374
Pixel Recurrent Neural Networks, p.CoRR, 2016. ,
Conditional image generation with pixelcnn decoders, p.CoRR, 2016. ,
Context Encoders: Feature Learning by Inpainting, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.2536-2544, 2016. ,
DOI : 10.1109/CVPR.2016.278
Glove: Global Vectors for Word Representation, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.1532-1543, 2014. ,
DOI : 10.3115/v1/D14-1162
Invertible Conditional GANs for image editing, 2016. ,
Fisher Kernels on Visual Vocabularies for Image Categorization, 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp.1-8, 2007. ,
DOI : 10.1109/CVPR.2007.383266
Improving the Fisher Kernel for Large-Scale Image Classification, pp.2010-143, 2010. ,
DOI : 10.1007/978-3-642-15561-1_11
URL : https://hal.archives-ouvertes.fr/inria-00548630
Making a Case for Learning Motion Representations with Phase, 2016. ,
DOI : 10.1145/2185520.2185561
Unsupervised representation learning with deep convolutional generative adversarial networks, p.CoRR, 2015. ,
Unsupervised representation learning with deep convolutional generative adversarial networks, pp.2016-2015 ,
Video (language ) modeling: a baseline for generative models of natural videos, p.CoRR, 2014. ,
Generative and Discriminative Algorithms for Spoken Language Understanding, In: InterSpeech. Antwerp, Belgium, pp.1605-1608, 2007. ,
Generative adversarial text to image synthesis, p.CoRR, 2016. ,
Generative adversarial text to image synthesis, Proceedings of The 33rd International Conference on Machine Learning, 2016. ,
Artistic Style Transfer for Videos, p.CoRR, 2016. ,
DOI : 10.1109/TVCG.2011.51
ImageNet Large Scale Visual Recognition Challenge, International Journal of Computer Vision (IJCV), vol.1153, pp.211-252, 2015. ,
Temporal Generative Adversarial Nets, 2016. ,
Introduction to modern information retrieval, 1986. ,
BoosTexter: A boosting-based system for text Categorization, Machine Learning, vol.39, pp.135-168, 2000. ,
Recognizing human actions: a local SVM approach, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., pp.32-36, 2004. ,
DOI : 10.1109/ICPR.2004.1334462
Bidirectional recurrent neural networks, Signal Processing, pp.2673-2681, 1997. ,
DOI : 10.1109/78.650093
CNN features off-the-shelf: an astounding baseline for recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp.806-813, 2014. ,
Semantic structuring of video collections from speech: segmentation and hyperlinking, 2015. ,
URL : https://hal.archives-ouvertes.fr/tel-01253678
IRISA at TrecVid2015: Leveraging Multimodal LDA for Video Hyperlinking ,
URL : https://hal.archives-ouvertes.fr/hal-01403726
Very deep convolutional networks for large-scale image recognition, 2014. ,
Cross-language linking of news stories on the web using interlingual topic modelling, Proc. of ACM Workshop on Social Web Search and Mining, 2009. ,
Semi-supervised recursive autoencoders for predicting sentiment distributions, Proceedings of the conference on empirical methods in natural language processing, pp.151-161, 2011. ,
Striving for simplicity: The all convolutional net, pp.1412-6806, 2014. ,
Probabilistic Topic Models, pp.424-440, 2007. ,
DOI : 10.4324/9780203936399.ch21
Sequence to sequence learning with neural networks " . In: Advances in neural information processing systems, pp.3104-3112, 2014. ,
Going deeper with convolutions, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1-9, 2015. ,
DOI : 10.1109/CVPR.2015.7298594
Rethinking the Inception Architecture for Computer Vision, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.2818-2826, 2016. ,
DOI : 10.1109/CVPR.2016.308
Multi-view 3D Models from Single Images with a Convolutional Network, pp.322-337, 2016. ,
DOI : 10.1109/ICCV.2015.123
A Testbed for Cross-Dataset Analysis, pp.1402-5923, 2014. ,
DOI : 10.1007/978-3-319-16199-0_2
What is left to be understood in ATIS?, 2010 IEEE Spoken Language Technology Workshop, pp.19-24, 2010. ,
DOI : 10.1109/SLT.2010.5700816
Grammar as a foreign language, Advances in Neural Information Processing Systems. 2015, pp.2755-2763 ,
Anticipating the future by watching unlabeled video, p.CoRR, 2015. ,
Generating videos with scene dynamics, pp.613-621, 2016. ,
A Step Beyond Local Observations with a Dialog Aware Bidirectional GRU Network for Spoken Language Understanding, Interspeech 2016, 2016. ,
DOI : 10.21437/Interspeech.2016-1301
URL : https://hal.archives-ouvertes.fr/hal-01351733
Bidirectional Joint Representation Learning with Symmetrical Deep Neural Networks for Multimodal and Crossmodal Applications, Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, pp.343-346, 2016. ,
Generative Adversarial Networks for Multimodal Representation Learning in Video Hyperlinking, Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval , ICMR '17, 2017. ,
DOI : 10.1145/2983563.2983567
URL : https://hal.archives-ouvertes.fr/hal-01522419
Is it time to switch to Word Embedding and Recurrent Neural Networks for Spoken Language Understanding, In: InterSpeech, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01196915
Multimodal and crossmodal representation learning from textual and visual features with bidirectional deep neural networks for video hyperlinking, Proceedings of the 2016 ACM workshop on Vision and Language Integration Meets Multimedia Fusion. ACM. 2016, pp.37-44 ,
One-Step Time-Dependent Future Video Frame Prediction with a Convolutional Encoder-Decoder Neural Network, 19th International Conference on Image Analysis and Processing (ICIAP), 2017. ,
Dense Optical Flow Prediction from a Static Image, 2015 IEEE International Conference on Computer Vision (ICCV), pp.2443-2451, 2015. ,
DOI : 10.1109/ICCV.2015.281
Patch to the Future: Unsupervised Visual Prediction, 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp.3302-3309, 2014. ,
DOI : 10.1109/CVPR.2014.416
An Uncertain Future: Forecasting from Static Images Using Variational Autoencoders, pp.835-851, 2016. ,
DOI : 10.1007/978-3-642-15552-9_51
Generative Image Modeling Using Style and Structure Adversarial Networks, European Conference on Computer Vision, pp.318-335, 2016. ,
DOI : 10.1109/CVPR.2016.309
Towards ai-complete question answering: A set of prerequisite toy tasks, 2015. ,
Show, attend and tell: Neural image caption generation with visual attention, International Conference on Machine Learning, pp.2048-2057, 2015. ,
Recurrent Neural Networks for Language Understanding, 2013. ,
Spoken language understanding using long short-term memory neural networks, 2014 IEEE Spoken Language Technology Workshop (SLT) ,
DOI : 10.1109/SLT.2014.7078572
EventNet, Proceedings of the 23rd ACM international conference on Multimedia, MM '15, pp.471-480 ,
DOI : 10.1109/CVPR.2014.20
Semantic Image Inpainting with Perceptual and Contextual Losses, 2016. ,
A Data-Driven Approach for Event Prediction, pp.707-720, 2010. ,
DOI : 10.1007/978-3-642-15552-9_51
StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks, 2016. ,
Robust LSTM-Autoencoders for Face De-Occlusion in the Wild, IEEE Transactions on Image Processing, vol.27, issue.2, 2016. ,
DOI : 10.1109/TIP.2017.2771408
Multi-Task Cross-Lingual Sequence Tagging from Scratch ,
A Crossmodal Approach to Multimodal Fusion in Video Hyperlinking, IEEE MultiMedia Special Issue: Vision and Language Integration Meets Multimedia Fusion, 2018. ,
One-Step Time-Dependent Future Video Frame Prediction with a Convolutional Encoder-Decoder Neural Network, Intl. Conf. on Image Analysis and Processing, 2017. ,
Label-Dependency Coding in Simple Recurrent Networks for Spoken Language Understanding, Interspeech 2017, 2017. ,
DOI : 10.21437/Interspeech.2017-1480
URL : https://hal.archives-ouvertes.fr/hal-01553830
Generative Adversarial Networks for Multimodal Representation Learning in Video Hyperlinking, ACM International Conference on Multimedia Retrieval, 2017. ,
Exploiting Multimodality in Video Hyperlinking to Improve Target Diversity, International Conference on Multimedia Modeling, 2017. ,
DOI : 10.1007/s10994-010-5198-3
URL : https://hal.archives-ouvertes.fr/hal-01498130
OneStep Time-Dependent Future Video Frame Prediction with a Convolutional Encoder-Decoder Neural Network, Netherlands Conference on Computer Vision, 2016. ,
A step beyond local observations with a dialog aware bidirectional GRU network for Spoken Language Understanding, Annual Conf. of the Intl. Speech Communication Association ? Interspeech . 2016. 106 Chapter ,
Multimodal and Crossmodal Representation Learning from Textual and Visual Features with Bidirectional Deep Neural Networks for Video Hyperlinking, ACM Multimedia 2016 Workshop: Vision and Language Integration Meets Multimedia Fusion, 2016. ,
Bidirectional Joint Representation Learning with Symmetrical Deep Neural Networks for Multimodal and Crossmodal Applications, ACM International Conference on Multimedia Retrieval, 2016. ,
Is it time to switch to Word Embedding and Recurrent Neural Networks for Spoken Language Understanding, Annual Conf. of the Intl. Speech Communication Association ? Interspeech, 2015. ,