Finding Actors and Actions in Movies, 2013 IEEE International Conference on Computer Vision, 2013. ,
DOI : 10.1109/ICCV.2013.283
URL : https://hal.archives-ouvertes.fr/hal-00904991
High Accuracy Optical Flow Estimation Based on a Theory for Warping, ECCV, 2004. ,
DOI : 10.1007/978-3-540-24673-2_3
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.4.1732
Improving Semantic Embedding Consistency by Metric Learning for Zero-Shot Classiffication, ECCV, 2016. ,
DOI : 10.1007/s11263-013-0695-z
On the relationship between visual attributes and convolutional networks, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015. ,
DOI : 10.1109/CVPR.2015.7298730
The Pascal Visual Object Classes (VOC) Challenge, International Journal of Computer Vision, vol.73, issue.2, 2005. ,
DOI : 10.1371/journal.pcbi.0040027
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.167.6629
Object Detection with Discriminatively Trained Part-Based Models, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.32, issue.9, 2010. ,
DOI : 10.1109/TPAMI.2009.167
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014. ,
DOI : 10.1109/CVPR.2014.81
URL : http://arxiv.org/abs/1311.2524
Finding action tubes, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2005. ,
DOI : 10.1109/CVPR.2015.7298676
Efficient hierarchical graph-based video segmentation, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2007. ,
DOI : 10.1109/CVPR.2010.5539893
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.294.4979
Observing humanobject interactions: Using spatial and functional compatibility for recognition, IEEE Trans. on PAMI, issue.2, 2009. ,
DOI : 10.1109/tpami.2009.83
ActivityNet: A large-scale video benchmark for human activity understanding, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015. ,
DOI : 10.1109/CVPR.2015.7298698
URL : http://repository.kaust.edu.sa/kaust/bitstream/10754/556141/1/ActivityNet_CVPR2015.pdf
Analysing Domain Shift Factors between Videos and Images for Object Detection, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.38, issue.11, 2016. ,
DOI : 10.1109/TPAMI.2016.2551239
URL : https://hal.archives-ouvertes.fr/hal-01281069
Object Detection from Video Tubelets with Convolutional Neural Networks, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2002. ,
DOI : 10.1109/CVPR.2016.95
URL : http://arxiv.org/pdf/1604.04053
Large-Scale Video Classification with Convolutional Neural Networks, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014. ,
DOI : 10.1109/CVPR.2014.223
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.471.3312
A Spatio-Temporal Descriptor Based on 3D-Gradients, Procedings of the British Machine Vision Conference 2008, 2008. ,
DOI : 10.5244/C.22.99
URL : https://hal.archives-ouvertes.fr/inria-00514853
ImageNet classification with deep convolutional neural networks, Communications of the ACM, vol.60, issue.6, 2004. ,
DOI : 10.1162/neco.2009.10-08-881
Attributebased classification for zero-shot visual object categorization, IEEE Trans. on PAMI, issue.3, 2014. ,
DOI : 10.1109/tpami.2013.140
Action Recognition by Hierarchical Mid-Level Action Elements, 2015 IEEE International Conference on Computer Vision (ICCV), 2015. ,
DOI : 10.1109/ICCV.2015.517
URL : http://arxiv.org/abs/1508.07654
On space-time interest points. IJCV Microsoft coco: Common objects in context, ECCV, 2005. ,
Recognizing human actions by attributes, CVPR 2011, 2006. ,
DOI : 10.1109/CVPR.2011.5995353
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.463.8447
SSD: Single shot multibox detector, ECCV, 2016. 1 ,
Visual Relationship Detection with Language Priors, ECCV, 2008. ,
DOI : 10.1023/B:VISI.0000029664.99615.94
URL : http://arxiv.org/abs/1608.00187
Action Recognition and Localization by Hierarchical Space-Time Segments, 2013 IEEE International Conference on Computer Vision, 2013. ,
DOI : 10.1109/ICCV.2013.341
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.663.1492
Ensemble of exemplar-SVMs for object detection and beyond, 2011 International Conference on Computer Vision, 2011. ,
DOI : 10.1109/ICCV.2011.6126229
Deep captioning with multimodal recurrent neural networks (m-rnn), ICLR, 2015. ,
Scene recognition and weakly supervised object localization with deformable part-based models, 2011 International Conference on Computer Vision, 2011. ,
DOI : 10.1109/ICCV.2011.6126383
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.300.7841
Multi-region Two-Stream R-CNN for Action Detection, ECCV, 2005. ,
DOI : 10.1109/CVPR.2015.7298735
URL : https://hal.archives-ouvertes.fr/hal-01349107
Learning to Refine Object Segments, ECCV, 2007. ,
DOI : 10.5244/C.30.15
URL : http://arxiv.org/abs/1603.08695
Explicit Modeling of Human-Object Interactions in Realistic Videos, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.35, issue.4, 2013. ,
DOI : 10.1109/TPAMI.2012.175
URL : https://hal.archives-ouvertes.fr/inria-00626929
Learning object class detectors from weakly annotated video, 2012 IEEE Conference on Computer Vision and Pattern Recognition, p.5 ,
DOI : 10.1109/CVPR.2012.6248065
URL : https://hal.archives-ouvertes.fr/hal-00695940
Discovering discriminative action parts from mid-level video representations, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012. ,
DOI : 10.1109/CVPR.2012.6247807
URL : https://hal.archives-ouvertes.fr/hal-00918807
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, NIPS, 2008. ,
DOI : 10.1109/TPAMI.2016.2577031
URL : http://arxiv.org/abs/1506.01497
Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008. ,
DOI : 10.1109/CVPR.2008.4587727
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.152.8729
Recognition using visual phrases, CVPR 2011, 2008. ,
DOI : 10.1109/CVPR.2011.5995711
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.226.5551
Deep Learning for Detecting Multiple Space-Time Action Tubes in Videos, Procedings of the British Machine Vision Conference 2016 ,
DOI : 10.5244/C.30.58
Recognizing human actions: a local SVM approach, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., 2004. ,
DOI : 10.1109/ICPR.2004.1334462
Two-stream convolutional networks for action recognition in videos, NIPS, 2008. ,
Very deep convolutional networks for large-scale image recognition, 2015. ,
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild, CRCV-TR-12-01, 2012. ,
Selective Search for Object Recognition, International Journal of Computer Vision, vol.57, issue.1, 2013. ,
DOI : 10.1023/B:VISI.0000013087.49260.fb
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.361.3382
Sequence to Sequence -- Video to Text, 2015 IEEE International Conference on Computer Vision (ICCV), 2015. ,
DOI : 10.1109/ICCV.2015.515
URL : http://arxiv.org/abs/1505.00487
Show and tell: A neural image caption generator, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015. ,
DOI : 10.1109/CVPR.2015.7298935
URL : http://arxiv.org/abs/1411.4555
Rapid object detection using a boosted cascade of simple features, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, 2001. ,
DOI : 10.1109/CVPR.2001.990517
A Robust and Efficient Video Representation for Action Recognition, International Journal of Computer Vision, vol.103, issue.1, 2015. ,
DOI : 10.1109/ICCV.2013.442
URL : https://hal.archives-ouvertes.fr/hal-01145834
Temporal Segment Networks: Towards Good Practices for Deep Action Recognition, ECCV, 2016. ,
DOI : 10.1109/CVPR.2016.219
URL : http://arxiv.org/abs/1608.00859
Learning to Track for Spatio-Temporal Action Localization, 2015 IEEE International Conference on Computer Vision (ICCV), 2015. ,
DOI : 10.1109/ICCV.2015.362
URL : https://hal.archives-ouvertes.fr/hal-01159941
Actor-Action Semantic Segmentation with Grouping Process Models, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p.7 ,
DOI : 10.1109/CVPR.2016.336
URL : http://arxiv.org/abs/1512.09041
Can humans fly? Action understanding with multiple classes of actors, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2007. ,
DOI : 10.1109/CVPR.2015.7298839
Human action recognition by learning bases of action attributes and parts, 2011 International Conference on Computer Vision, 2011. ,
DOI : 10.1109/ICCV.2011.6126386
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.227.6992
Describing Videos by Exploiting Temporal Structure, 2015 IEEE International Conference on Computer Vision (ICCV), 2015. ,
DOI : 10.1109/ICCV.2015.512
URL : http://arxiv.org/abs/1502.08029