Unsupervised learning from narrated instruction videos, CVPR. 43, vol.44, p.139, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01171193
Joint Discovery of Object States and Manipulation Actions, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01676084
2d human pose estimation: New benchmark and state of the art analysis, 2014. ,
Multiscale combinatorial grouping, CVPR, vol.38, p.111, 2014. ,
Sequential deep learning for human action recognition, International Workshop on Human Behavior Understanding, p.34, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-01354493
DIFFRAC: A discriminative and flexible framework for clustering, NIPS. 42, vol.47, p.117, 2007. ,
Social scene understanding: End-to-end multi-person action localization and collective activity recognition, CVPR, vol.38, p.76, 2017. ,
Neural machine translation by jointly learning to align and translate, 2014. ,
Speeded-up robust features (SURF), Comput. Vis. Image Underst, p.61, 2008. ,
Surf: Speeded up robust features, ECCV, p.27, 2006. ,
Poof: Part-based one-vs.-one features for finegrained categorization, face verification, and attribute estimation, 2013. ,
Fullyconvolutional siamese networks for object tracking, BMVC, p.86, 2016. ,
Finding actors and actions in movies, ICCV. 42, vol.48, p.139, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00904991
Weakly supervised action labeling in videos under ordering constraints, ECCV. 43, vol.49, p.141, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01053967
Weakly-supervised alignment of video with text, ICCV, p.43, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01154523
Online learning and stochastic approximations, p.33, 1998. ,
High accuracy optical flow estimation based on a theory for warping, ECCV, vol.29, p.84, 2004. ,
SST: Singlestream temporal action proposals, 2017. ,
Fast temporal activity proposals for efficient detection of human actions in untrimmed videos, 2016. ,
Deep clustering for unsupervised learning of visual features, 2018. ,
Quo vadis, action recognition? A new model and the Kinetics dataset, CVPR, vol.34, p.118, 2017. ,
The devil is in the details: an evaluation of recent feature encoding methods, BMVC, vol.59, p.61, 2011. ,
Return of the devil in the details: Delving deep into convolutional nets, 2014. ,
Action Detection by implicit intentional Motion Clustering, ICCV, vol.44, p.112, 2015. ,
Articulated pose estimation by a graphical model with image dependent pairwise relations, NIPS, vol.33, p.137, 2014. ,
Mixing body-part sequences for human pose estimation, CVPR, vol.35, p.70, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00978643
A flexible model for training action localization with varying levels of supervision, NIPS, vol.20, p.140, 2018. ,
, , 2015.
P-CNN: Pose-based CNN features for action recognition, ICCV, vol.20, p.138, 2015. ,
Modeling spatio-temporal human track structure for action localization, vol.20, p.23, 2018. ,
Detecting parts for action localization, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01573629
Learning phrase representations using rnn encoder-decoder for statistical machine translation, vol.75, p.80, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01433235
Unsupervised object discovery and localization in the wild: Part-based matching with bottom-up region proposals, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01110036
Empirical evaluation of gated recurrent neural networks on sequence modeling, 2014. ,
Support-vector networks. Machine learning, p.30, 1995. ,
Visual categorization with bags of keypoints, ECCV workshop, p.29, 2004. ,
Compact representation of bidirectional texture functions, 2001. ,
Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition, IEEE Transactions on Audio, Speech, and Language Processing, vol.73, p.79, 2012. ,
Histograms of oriented gradients for human detection, CVPR, vol.27, p.61, 2005. ,
URL : https://hal.archives-ouvertes.fr/inria-00548512
Human detection using oriented histograms of flow and appearance, ECCV, vol.29, p.61, 2006. ,
URL : https://hal.archives-ouvertes.fr/inria-00548587
Sympathy for the details: dense trajectories and hybrid classification architectures for action recognition, vol.73, p.76, 2016. ,
Imagenet: A large-scale hierarchical image database, vol.57, p.137, 2009. ,
Long-term recurrent convolutional networks for visual recognition and description, vol.71, p.74, 2015. ,
The Yael library, ACM Multimedia, p.61, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01020695
Discovering localized attributes for fine-grained recognition, 2012. ,
Automatic annotation of human actions in video, vol.112, p.139, 2009. ,
Daps: Deep action proposals for action understanding, 2016. ,
Two-frame motion estimation based on polynomial expansion, SCIA, p.61, 2003. ,
A bayesian hierarchical model for learning natural scene categories, 2005. ,
Spatiotemporal residual networks for video action recognition, NIPS, vol.33, p.76, 2016. ,
Convolutional two-stream network fusion for video action recognition, CVPR, vol.33, p.76, 2016. ,
Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography, Commun. ACM, vol.30, p.61, 1981. ,
An algorithm for quadratic programming, Naval Research Logistics Quarterly, vol.50, p.118, 1956. ,
Actom sequence models for efficient action detection, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00575217
Fast r-cnn, ICCV, vol.75, p.85, 2015. ,
Rich feature hierarchies for accurate object detection and semantic segmentation, CVPR, vol.33, p.34, 2014. ,
, , vol.112, p.118, 2018.
Finding action tubes, CVPR, vol.38, p.111, 2015. ,
, THUMOS challenge: Action recognition with a large number of classes, p.76, 2015.
Speech recognition with deep recurrent neural networks, 2013. ,
AVA: A video dataset of spatiotemporally localized atomic visual actions, CVPR. 39, 98, 99, vol.100, p.127, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01764300
A combined corner and edge detector, Alvey vision conference, p.27, 1988. ,
Deep residual learning for image recognition, CVPR, vol.97, p.118, 2016. ,
ActivityNet: A large-scale video benchmark for human activity understanding, vol.35, p.40, 2015. ,
Joint segmentation and classification of human actions in video, 2011. ,
Long short-term memory, 1997. ,
Determining optical flow, In Artificial intelligence, vol.29, 1981. ,
Tube convolutional neural network (T-CNN) for action detection in videos, ICCV, vol.39, p.111, 2017. ,
Connectionist Temporal Modeling for Weakly Supervised Action Labeling, ECCV. 43, vol.112, p.141, 2016. ,
The THUMOS challenge on action recognition for videos "in the wild, Computer Vision and Image Understanding, p.40, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01431525
Action localization with tubelets from motion, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00996844
Aggregating local image descriptors into compact codes, 2012. ,
URL : https://hal.archives-ouvertes.fr/inria-00633013
Towards understanding action recognition, ICCV, vol.35, p.137, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00906902
3D convolutional neural networks for human action recognition, 2010. ,
Discriminative clustering for image cosegmentation, CVPR, vol.48, p.113, 2010. ,
Action tubelet detector for spatio-temporal action localization, ICCV. 39, 40, 75, vol.126, p.136, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01519812
Joint learning of object and action detectors, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01575804
Contextlocnet: Context-aware deep network models for weakly supervised localization, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01421772
Deep visual-semantic alignments for generating image descriptions, CVPR, vol.74, p.79, 2015. ,
Large-scale video classification with convolutional neural networks, 2014. ,
The kinetics human action video dataset, vol.16, p.137, 2017. ,
Efficient visual event detection using volumetric features, ICCV, vol.38, p.111, 2005. ,
Adam: A method for stochastic optimization, 2014. ,
A spatio-temporal descriptor based on 3d-gradients, 2008. ,
URL : https://hal.archives-ouvertes.fr/inria-00514853
ImageNet classification with deep convolutional neural networks, NIPS, vol.32, p.57, 2012. ,
HMDB: a large video database for human motion recognition, ICCV. 13, vol.17, p.54, 2011. ,
Block-coordinate Frank-Wolfe optimization for structural SVMs, ICML. 51, vol.113, p.118, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00720158
On space-time interest points, IJCV, vol.27, p.28, 2005. ,
Modeling and visual recognition of human actions and interactions. Habilitation à diriger des recherches, 2013. ,
URL : https://hal.archives-ouvertes.fr/tel-01064540
Local velocity-adapted motion events for spatio-temporal recognition. Computer vision and image understanding, vol.27, p.28, 2007. ,
Local descriptors for spatio-temporal recognition, ECCV workshop, p.27, 2004. ,
Learning realistic human actions from movies, CVPR, vol.55, p.61, 2008. ,
URL : https://hal.archives-ouvertes.fr/inria-00548659
Retrieving actions in movies, ICCV, vol.29, p.111, 2007. ,
Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories, CVPR, vol.29, p.30, 2006. ,
URL : https://hal.archives-ouvertes.fr/inria-00548585
Gradient-based learning applied to document recognition, Proceedings of the IEEE, vol.33, p.54, 1998. ,
Efficient backprop, Neural networks: Tricks of the trade, vol.33, 1998. ,
Microsoft coco: Common objects in context, 2014. ,
Spatio-temporal LSTM with trust gates for 3D human action recognition, 2016. ,
Ssd: Single shot multibox detector, vol.76, p.97, 2016. ,
Object recognition from local scale-invariant features, ICCV, p.27, 1999. ,
Visual relationship detection with language priors, 2016. ,
An iterative image registration technique with an application to stereo vision, IJCAI, vol.29, p.118, 1981. ,
Learning activity progression in LSTMs for activity detection and early detection, CVPR, vol.38, p.76, 2016. ,
Learning object representations for visual object class recognition, 2007. ,
, , 2012.
Face detection without bells and whistles, 2014. ,
Localizing Actions from Video labels and Pseudo-Annotations, BMVC, vol.44, p.126, 2017. ,
Spot on: Action localization from pointly-supervised proposals, ECCV. 45, 99, vol.110, p.125, 2016. ,
Learning from video and text via large-scale discriminative clustering, vol.113, p.118, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01569540
Learning a text-video embedding from incomplete and heterogeneous data, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01975102
A performance evaluation of local descriptors, 2005. ,
URL : https://hal.archives-ouvertes.fr/inria-00548227
Stacked hourglass networks for human pose estimation, ECCV, vol.33, p.137, 2016. ,
Beyond short snippets: Deep networks for video classification, CVPR, vol.34, p.91, 2015. ,
Multiple granularity analysis for fine-grained action detection, 2014. ,
Sampling strategies for bag-of-features image classification, 2006. ,
URL : https://hal.archives-ouvertes.fr/hal-00203752
Spatio-temporal object detection proposals, ECCV, vol.38, p.111, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01021902
Action and event recognition with fisher vectors on a compact feature set, vol.61, p.66, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00873662
Learning and transferring midlevel image representations using convolutional neural networks, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00911179
Minding the gaps for block Frank-Wolfe optimization of structured SVMs, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01323727
Multi-region two-stream R-CNN for action detection, HAL. 38, 75, vol.76, p.111, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01349107
Action recognition with stacked fisher vectors, 2014. ,
Large-scale image retrieval with compressed fisher vectors, 2010. ,
Improving the fisher kernel for large-scale image classification, vol.31, p.61, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00548630
Weakly-supervised learning of visual relations, ICCV, vol.17, p.138, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01576035
Beyond temporal pooling: Recurrence and temporal convolutions for gesture recognition in video, IJCV, vol.34, p.76, 2017. ,
Parsing videos of actions with segmental grammars, 2014. ,
Poselet conditioned pictorial structures, CVPR, vol.35, p.55, 2013. ,
Explicit modeling of human-object interactions in realistic videos, 2013. ,
URL : https://hal.archives-ouvertes.fr/inria-00626929
Faster R-CNN: Towards real-time object detection with region proposal networks, NIPS, vol.85, p.118, 2015. ,
Weakly supervised action learning with rnn based fine-to-coarse modeling, CVPR. 43, vol.112, p.141, 2017. ,
Action mach a spatio-temporal maximum average correlation height filter for action recognition, 2008. ,
A database for fine grained activity detection of cooking activities, CVPR. 13, vol.36, p.68, 2012. ,
Recognizing fine-grained and composite activities using hand-centric features and script data, 2016. ,
Learning internal representations by error propagation, Parallel distributed processing, p.33, 1985. ,
ImageNet large scale visual recognition challenge, IJCV, vol.32, p.118, 2015. ,
Amtnet: Action-micro-tube regression by end-to-end trainable deep architecture, ICCV, vol.39, p.111, 2017. ,
Deep learning for detecting multiple space-time action tubes in videos, BMVC. 38, 39, 41, 75, vol.76, p.136, 2016. ,
Modec: Multimodal decomposable models for human pose estimation, CVPR, vol.35, p.59, 2013. ,
Parsing human motion with stretchable models, CVPR, vol.35, p.55, 2011. ,
Local grayvalue invariants for image retrieval, 1997. ,
URL : https://hal.archives-ouvertes.fr/inria-00548358
Recognizing human actions: A local SVM approach, ICPR, vol.27, p.55, 2004. ,
Temporal action localization in untrimmed videos via multi-stage CNNs, CVPR, vol.38, p.139, 2016. ,
Asynchronous temporal fields for action recognition, 2017. ,
Fisher vector faces in the wild, BMVC, vol.29, p.31, 2013. ,
Two-stream convolutional networks for action recognition in videos, NIPS, vol.33, p.73, 2014. ,
A multi-stream bi-directional recurrent neural network for fine-grained action detection, CVPR, vol.34, p.76, 2016. ,
Online real time multiple spatiotemporal action localisation and prediction, vol.39, p.118, 2017. ,
Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-supervised Object and Action Localization, ICCV, vol.43, p.112, 2017. ,
Weakly Supervised Action Detection, BMVC, vol.44, p.112, 2011. ,
Video google: A text retrieval approach to object matching in videos, 2003. ,
Unsupervised action discovery and localization in videos, ICCV, vol.44, p.112, 2017. ,
UCF101: A dataset of 101 human actions classes from videos in the wild, vol.17, p.137, 2012. ,
Unsupervised learning of video representations using LSTMs, 2016. ,
Generating text with recurrent neural networks, 2011. ,
Deepface: Closing the gap to human-level performance in face verification, 2014. ,
Motion words for videos, 2014. ,
Convolutional learning of spatio-temporal features, 2010. ,
Learning video object segmentation with visual memory, ICCV, vol.80, p.89, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01511145
Joint training of a convolutional network and a graphical model for human pose estimation, NIPS, vol.54, p.55, 2014. ,
DeepPose: Human pose estimation via deep neural networks, CVPR, vol.33, p.137, 2014. ,
Learning spatiotemporal features with 3D convolutional networks, 2015. ,
Selective search for object recognition, IJCV, vol.38, p.111, 2013. ,
APT: Action localization proposals from dense trajectories, BMVC, vol.38, p.111, 2015. ,
Long-term temporal convolutions for action recognition, TPAMI, vol.33, p.76, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01241518
Show and tell: A neural image caption generator, 2015. ,
Action recognition by dense trajectories, vol.31, p.135, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00583818
Dense trajectories and motion boundary descriptors for action recognition, IJCV, vol.29, p.61, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00803241
Action recognition with improved trajectories, ICCV, vol.61, p.66, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00873267
UntrimmedNets for Weakly Supervised Action Recognition and Detection, CVPR, vol.43, p.112, 2017. ,
Convolutional pose machines, CVPR, vol.33, p.137, 2016. ,
Learning to track for spatiotemporal action localization, ICCV, vol.38, p.111, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01159941
Human action localization with sparse spatial supervision, vol.127, p.136, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01317558
Deepflow: Large displacement optical flow with deep matching, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00873592
Can humans fly? action understanding with multiple classes of actors, 2015. ,
Maximum margin clustering, NIPS. 43, vol.47, p.114, 2004. ,
Common action discovery and localization in unconstrained videos, ICCV, vol.44, p.112, 2017. ,
Articulated pose estimation with flexible mixtures-ofparts, CVPR, vol.35, p.65, 2011. ,
End-to-end learning of action detection from frame glimpses in videos, vol.74, p.76, 2016. ,
Temporal action localization with pyramid of score distribution features, CVPR, vol.38, p.76, 2016. ,
Beyond short snippets: Deep networks for video classification, CVPR, vol.33, p.55, 2015. ,
A duality based approach for realtime tv-l 1 optical flow, Joint Pattern Recognition Symposium, p.29, 2007. ,
Temporal action detection with structured segment networks, 2017. ,
Interaction part mining: A mid-level approach for fine-grained action recognition, vol.67, p.68, 2015. ,
Pipelining localized semantic features for fine-grained action recognition, 2014. ,
Chained multi-stream networks exploiting pose, motion, and appearance for action classification and detection, vol.99, p.111, 2017. ,